Student Evaluations of Instructors May Be Flawed
For years, one of the most standard practices in higher education has been what’s called “student evaluation of teaching,” also known by its acronym SET. This is a process that is carried out in just about every college and university course every semester or term. They are used essentially to measure how effective a teacher is by surveying student satisfaction. And the results of these surveys are used to make many important decisions, such as whether or not faculty members should be granted tenure – or even keep their jobs.
A new study, however, is challenging the conventional wisdom that there is a correlation between student evaluations and learning. The study, titled “Meta-analysis of Faculty’s Teaching Effectiveness: Student Evaluation of Teaching Ratings and Student Learning are not Related,” concludes that SETs are unreliable due to various kinds of biases against instructors, and question whether students learn more in the courses taught by highly rated instructors.
The researchers looked at 97 studies, some of them as old as 1981, which have been cited over time as evidence of the effectiveness of student evaluations. What these researchers did was to re-analyze the data from those studies (most carried out at U.S. institutions of higher education) through a methodology called meta-analysis. This is a method that combines pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power.
The research suggests that past analyses linking student achievement to high student teaching evaluation ratings are flawed, a mere “artifact of small sample sized studies and publication bias” and that there was no significant correlations between [evaluation] ratings and learning.”
Further, the research indicated that “institutions focused on student learning and career success may want to abandon SET ratings as a measure of faculty’s teaching effectiveness,” adding that, “The entire notion that we could measure professors’ teaching effectiveness by simple ways such as asking students to answer a few questions about their perceptions of their course experiences, instructors’ knowledge and the like seems unrealistic given well-established findings from cognitive sciences such as strong associations between learning and individual differences including prior knowledge, intelligence, motivation and interest. Individual differences in knowledge and intelligence are likely to influence how much students learn in the same course taught by the same professor.”
This study brings into focus a number of questions that have been on the table for many years. The first one is how well SETs are designed? Any expert on public opinion will tell you that the way you formulate a question in a survey will influence the kind of responses you obtain. Most SETs are not designed by experts on public opinion. They are usually put together by the administration of the institution and many even allow the professors themselves to add questions to the standard survey.
The second one is the difference in evaluations between tougher and easier courses. Professors teaching more challenging subjects may find that the SETs for those courses show a lower level of satisfaction just because the students do not like the subject matter, and not because the teacher was not effective teaching it.
A third variable is whether the course is mandatory or an elective needed to fulfill a particular set of requirements for a major. When the course is mandatory students may show disdain toward the subject while, feeling more comfortable with courses they choose at will.
The instructor’s personality may also influence students’ responses. Highly demanding teachers may appear less acceptable to students than “easy” ones. In fact, it has been discussed for many years that some teachers who need good student evaluations in order to get good annual reviews may go “easy” on the students so they get more favorable ratings. That may be particularly the case of those seeking tenure.
Other factors may also bias the results of those surveys. For example, many studies have shown that female instructors tend to receive lower ratings in SETs just because some students perceive them as having less authority than their male counterparts. The same thing happens with non-white instructors, or for those for which English is not their native language.
To avoid these biases, some institutions have introduced the practice of in-class observations of professors by their peers. Although that is a good idea since they can provide additional assessment, there is nothing that prevents those observers from being biased themselves, whether because of personal relationships with the individual observed or because lack of appreciation for different teaching styles.
Finally, more and more colleges and universities have introduced electronic evaluation of their professors that can be done anywhere. Instead of asking students to write their evaluations during class time (with the instructor being out of the classroom) now students are given the choice of doing the evaluations electronically from anywhere. The direct result of this practice is that the number of students filling out those evaluations decreases significantly and those filling them out are students who are either extremely satisfied or extremely dissatisfied with their experience in class.
At the end of the day these issues underline a major problem colleges and universities have been struggling with for decades: How to measure how well students are prepared once they graduate? No wonder that the answer to this fundamental question never appears in the multiple rankings that are published each year by magazines because the answer is very simple. We really do not know.
PDF Version:
Student Evaluations of Instructors May Be Flawed