Saturday, January 27, 2007

Why Multiple Choice Tests (Even "Good" Ones) Tell Us Nothing

Recently, on the Assessment Reform Network list, Richard Hake cited evidence for the use/value of multiple choice tests (MCT's). The evidence came from studies of high school and college students.

I would be rather surprised if there were any studies that have been done that show any value for younger kids, especially when you consider the kinds of MCT's that young kids are exposed to. For example, here's a real question from Tennessee's 2005 state test in Governance and Civics given to students in Grade 3:

"Which of these is an example of someone being a good citizen?"

a) A girl steals candy from a store.
b) A boy puts his litter in a trash can.
c) A man lets his dog run loose on the street.
d) A woman drives faster than the speed limit.

I think this is what most people (including me) have in mind when we criticize MCT's. If there are lots of really good MCT's that are out there right now in grades 3-8, for example, I'd love to know about them.

Here's the thing: students that are not pigeon-holed, tracked, labeled "LD," counseled out, or otherwise have their love of learning obliterated by such asinine tests and their associated test prep regimes are lucky.

Sorry. I don't like MCT's, even the so-called "good ones." Here's why: I am a terrible test-taker. When I took MCT's in the past, I got extremely anxious. I thought to myself, "One of these answers might not be the RIGHT response, but one of them is supposed to be the BEST response." So I tended to over-think and over-analyze each question. Of course, none of the analysis I performed counted towards anything, especially if I chose the "incorrect" response. Not one shred of the process I underwent to arrive at my choice was recorded. I took too long. I didn't answer enough of the questions. As I was taking the test, I was aware that I was taking too long. So this increased my anxiety. I kept hearing my teacher's advice: "Remember - don't take too long on each question; if you don't know the answer, just eliminate the most obviously bad choice and then guess among those that remain." Good advice, certainly, prior to taking the test. But not very effective while in the heat of MCT battle.

Hake argues in response to the question, "Why MCT's?", "So that the tests can be given to thousands of students in hundreds of courses under varying conditions in such a manner that meta-analyses can be performed, thus establishing general causal relationships in a convincing manner." I'm not convinced by anything other than the fact that the students who did well were good at taking MCT's.

What makes the example I gave above from Tennessee's 2005 state test in Governance and Civics both tragic and comic is that it's intended -- with no irony or humor involved -- to measure the extent to which 3rd graders have met the following standard:

"3.4.spi.2 Determine the representative acts of a good citizen (i.e., obeying speed limit, not littering, walking within the crosswalk)."

So, presumably those third graders in Tennessee who chose Answer B - " A boy puts his litter in a trash can." - are now able to determine the representative acts of a good citizen. The most offensive aspect of this is that measuring citizenship is reduced to a multiple choice question that students either get right or wrong. In addition, the students could have easily ruled out the other choices as being obviously wrong and were left with only one answer - B. So what this means is that students may not know what "the representative acts of a good citizen" ARE - they simply know what they are NOT. Of course, because these tests are standards-based, Tennessee officials can sleep at night (and get re-elected), knowing they have definitive, psychometrically-backed proof that their state's 3rd graders are good citizens.

The assumption/assertion about the MCT's that assess higher-order thinking is that the results show "general causal relationships in a convincing manner" is misleading because it overlooks the experience of taking these tests, i.e., that students either stress out, over-think, etc., while taking them or they don't. This means that the results must first be analyzed through this lens. In other words, you would have to be able to measure the students' affective response to test-taking first and then, for those students who have a negative affective response, be able to account for that in some way. Personally, the way I would account for that is to throw out the results and look at other measures. For those students who had a positive or neutral affective response to test-taking, you would then have to determine the extent to which this affective disposition to test-taking skewed the results and, ultimately, make these students seem "smarter" than they might actually be -- and certainly much, much "smarter" than the students who have a negative affective disposition.

Have any studies ever been done that controlled for this affective disposition to test-taking?

The only thing that I know of that comes close is the studies that Claude Steele did RE: "stereotype threat." (Here's a quick overview here.)

What's interesting in light of Steele's research is that minority kids might actually deal with a double-dose of affective dispositions to test taking that negatively affect their results, i.e, they might -- independent of their race -- feel apprehensive and anxious about taking tests and not test well AND also experience this "stereotype threat."

2 comments:

Anonymous said...

What i do not understand is how my child can be at the top of her class for the last three years making the highest average in reading, math and language but yet she does not score as high as other students in her class on the MCT.

Tammy said...

I am so happy to have found this blog!

I've been against multiple choice tests my whole life, even as a student. I do not feel that they adequately reflect the knowledge of the test taker.

When I was in freshman high school algebra I had a great teacher who did not give multiple choice tests. She required us to do the full work and then she graded us on the work, not merely if we got the answer correct or not. This allowed her to grade based on my understanding of the problem: Did I know how to get to the correct answer rather than did I get the correct answer. If the answer was incorrect she would mark down based on where I went wrong in the problem, and not for getting the answer itself wrong. This approach allowed me to see where I went wrong and how to correct it the next time around. She actually taught me! She was the only math teacher I ever had who worked this way. It was the only math class I ever excelled in.

I feel that all testing should take an approach - an essay form if you will. Where one can then be graded based on their knowledge of the subject rather than an absolute answer that usually is not a true reflection of knowledge. It's the only way we can truly evauate the abilities of our children.

Thank you again for another great article on education.