Thursday, March 17, 2011

Are multiple-choice tests capable of evaluating learning?

"The writer's [teacher's] job is not to judge, but to understand."
Ernest Hemingway

Last week, I returned to writing about the Teaching-Learning Cycle (T-LC) by considering whether or not we can reclaim assessment from high-stakes tests. Recall that evaluation follows assessment in the T-LC: What can learners do? What are they trying to do? What comes next? Staying with the theme of standardized tests, I want to explore the idea of using multiple-choice items to evaluate learning.

My favorite example of this comes from the 1983 NAEP math survey. I had recently started teaching middle school math when the results for the item shown to the right came out. Only 24% of the national sample of thirteen-year-olds answered this item correctly. Nearly as many of the learners answered (c) 31.33. This is the first time I remember examining the other possibilities and wondering, “What were the kids that selected these other responses thinking?” [This question continues to fascinate me and resulted in this article that I wrote with Dr. Pam Wells.]

Using the evaluation framework, I assume that most of those answering 12 can find the remainder of the division problem and are trying to use it to answer the question. The kids answering 31 might be finding the correct number of buses by estimating the quotient or rounding it to the nearest whole number. And the 31.33 answers suggest that these thirteen-year-olds can do the division correctly but forgot to keep the context in mind. In fact, this focus on making sense of the situation seems to be what comes next for all of those who chose (a), (b), or (c).

You will notice that I did a fair amount of equivocating in my evaluation of the different responses in the previous paragraph. This is the problem with multiple-choice items – we cannot be sure why the test-taker selected one response over the other. Was it based on understanding? Or was it a guess? Or was it an intentional effort to undermine the assessment process? Only by looking at a group of responses can we begin to reduce such errors and be more confident in our evaluation of learners’ thinking. That is why I do not like using multiple-choice items as summative assessments but use them to track the progress of classes as a whole on content we are exploring.

Still, I believe I can improve the question so that it is easier to more accurately evaluate where learners are fluent and where they are approximating. One possible way is to replace 12 (which I consider the most unreasonable answer) with “I’m not sure so any choice would be a guess.” This does not add to my evaluation burden and because the item is formative the learner is more likely to be honest if they know I will use the information to inform instruction and improve their likelihood of later success.

For even more data, do as Karen Bailey suggests (in this PowerPoint) and simply add “Explain” to the end of the item. Some possible rationales are shown to the left. These responses are fascinating and certainly can improve our ability to evaluate their thinking and tailor future instruction to support their continued growth.

It will require more evaluation effort to look through and make sense of open-ended responses and teachers need to decide if it is justified. I once heard Grant Wiggins ask of assessments, “Is the juice worth the squeeze.” But this question should apply to the test-takers as well as the evaluators. Why should we subject kids to an assessment if the task does not truly measure what was intended?

This is one of the first posts that truly did get at my reason for starting this blog. I began this post thinking I would defend multiple-choice items as a reasonable way to evaluate groups of learners from a formative perspective. While I still believe evaluation can be done using a selected response format, I’m not sure identifying what learners can do and are trying to do is as easy as I thought. What do you think?


  1. Dave-
    I agree that is is incredibly difficult to actually identify the students reasoning behind their answer on a multiple-choice test. Wyoming Park utilizes common assessments in the math program, all of which are multiple choice tests. In our classroom we have the students complete a practice test, which is incredibly similar to the actual test, but they have to show all of their work on this practice test. However, I still find myself feeling frustrated when students get questions wrong on the test and I have no way of identifying how/why they got it wrong and what I can do next to help them increase their understanding of the topic.
    Great question posed in this post.

  2. I agree with Carlee on this. Multiple choice questions are nice and can be great for formative assessment. But it does not allow the teachers to actually see what part of the topic the student is struggling with. It might be that they understand the underlying math perfectly but the wording confused them. Or they are lacking the underlying principles that leads to the problems with understanding this new topic. But with multiple choice tests, this is all we as teachers would be able to do, speculate as to what is happening.

    I also strongly dislike when students do not show any work and just have an answer. Like your grading system, they made it to Detroit but have no way of explaining how they got there. I feel the explanation is more important to understanding students and becoming an effective educator.

  3. I've mentioned similar things before on my own blog: I prefer essay questions where students have to show their work and thinking, because it's harder to cheat on, easier to see where a student has (or lacks) understanding, and tends to require students to actually present a rational argument rather than just giving a numeric answer (which mirrors what they'll have to do in the Real World more often than not).

    The problem is that the more robust the assessment, the more time and effort it takes to grade. So there is a tradeoff here between teaching time/effort and quality of assessment.

    So, lazy teachers will use ScanTron exams because they are easy. But even quality teachers have a real-world time constraint (you can only spend so much time on a class, total) so the question becomes, how to optimize the time spent (between preparing classroom lectures or demonstrations, grading, meeting with students off-hours, etc.)?

    Incidentally, this is true for all subjects, not just math.

  4. Carlee, Jacob, and Ian -
    You all have hit upon a key point. We need data in order to accurately evaluate where learners are at in their understanding but there are other constraints (such as time) that makes simply collecting more data an unreasonable response. Where's the break even point?

    Would it make a difference if we evaluated collaboratively instead of in isolation? What if we used colleges of education to provide support for evaluation? Maybe teachers in training could help in the evaluation. Just brainstorming here - what do you think?

  5. What if the assessment was designed to examine the students' process of thinking through the problem? For math, design the multiple choice questions so that they first check their understanding of the first part of a process, then the next step, etc. We may be able to see where the understanding ends and the help needs to begin. The advantage of using these types of tests is the quick feedback (if the feedback is meaningful).