Wednesday, April 12, 2006

What to assess and how to assess it?

While there’s certainly a role for teacher-directed learning involving direct instruction and pre-selection of learning outcomes, I’m wondering how the question of what to assess gets answered in a student-directed pedagogy. In my classes, I try to give students a set of very broad standards to work with that they are charged with interpreting in dialogue with me and with other students. It is up to them to choose work that shows they have met these standards and then make arguments (written and oral, formal and informal) about how the work meets the standards. I ask students to work on projects of their own design, come up with the goals and outcomes, and then ask them to determine if those goals and outcomes were met. I then ask them to reflect on this process of articulating and meeting goals.

In the case of student-directed learning, the question of what to assess is left to the student. But how can the student know what to assess? How can the student be taught how to assess it meaningfully such that the assessment produces insight and insight produces learning? I’m thinking specifically of the pedagogy and curriculum of The Met School, where each student creates his/her own curriculum and drives his/her own learning. While the notion of involving students in assessment and having them be key players in assessment is radical (and crucial), the implicit context in which this radical innovation occurs is often pretty conventional. So how to take a radical notion of formative assessment for learning and put it in a radical context of constructivist learning?

Which brings me to rubrics. The Six Traits of Writing rubric keeps coming to mind as a great model of formative assessment. In addressing the issues I described above, i.e., the issues of what to assess and how to assess it in a student-centered pedagogy, the Six Traits rubric has this to say: introduce the six traits of writing as a possible model for what “good” writing is. Have students work with it to get familiar with it. Then, once they have used it and begun to internalize it a bit, invite them to critique it. Invite them to add more traits. Invite them to refine the performance descriptors. In so doing, it strikes an ideal balance between teacher-directed and student-directed learning. The teacher, as a more knowledgeable and experienced learner, brings in the Six Traits rubric and says, “Try this. I think it’s pretty useful. What do you think? But before you answer, get to know it really well so that your answer will be thoughtful and meaningful and based on your personal experience.” The students respond to the direction of the teacher and come back with their analysis. “Yes, it works well, but how about this?” or “No, it doesn’t work well: I tried applying it to Shakespeare and it was terrible!”

So what to assess and how to assess it? In this case, the teacher starts the dialogue with a specific example: here is what to assess, and here is how to assess it. The students are then free to respond. But their response is also potentially generative, i.e., it critiques the model but also introduces new ways of thinking. These new ways of thinking are posed to the teacher and to the other students, who then critique these new ways of thinking and generate yet more ways of thinking. So the Six Traits rubric is not merely about learning the traits of writing to become a good writer; it’s about using the traits as a point of departure for critical reflection and analysis.

In using the traits, we can get beyond the whole subjective/objective debate and the validity/reliability conundrum and say, “Yes, these are wholly subjective and wholly arbitrary measures and traits. HOWEVER, they apply pretty well to most kinds of writing done in academic settings AND they provide the ground for an inquiry into what we mean by ‘good writing.’” The traits are pretty good on validity, but not so good on reliability (without intense training and without having multiple raters checking each other’s work, i.e., without a lot of money!) BUT, we can relax a bit by recognizing that most measures tend to lean one way over the other, i.e., that measures tend to be more valid than reliable or more reliable than valid. Yes, I know the hard-core psychometricians will argue this point. But you will never be able to convince me that a high-stakes, multiple-choice, standardized, norm-referenced test taken in a timed environment is ever going to be anything more than reliable. Valid? Hardly. By the same token, you’ll be wasting your breath if you want to argue that trait-based or rubric-based assessment of student work samples is as reliable as the US postal service. Sure, it’s sort of reliable. Sort of. But valid? Oh my word, yes!

My point? As long as we have multiple measures of student learning, and as long as some of these measures are high in validity and others are high in reliability, and as long as the measures attempt to be both valid and reliable (recognizing this might not actually be possible, but it’s a good goal to shoot for), and as long as these measures can be used in combination with each other and serve to corroborate or question their findings, we can relax. And if this doesn’t reassure us, then perhaps Albert Einstein can. Einstein, the most brilliant high-school dropout to walk the planet, said, “Not everything that counts can be counted, and not everything that can be counted counts.” With this, Einstein gives us the freedom to do the most comprehensive assessment work possible, keeping in mind that our work will always be flawed, will always be somewhat contrived and artificial, and will never fully account for what a student knows and can do – indeed, will never tell us who a student is. But, having said this, it’s not like we can dance in the streets and do The Nihilist Shuffle, shouting, “Hurray, everything counts! And nothing counts!” It’s not like this gets us off the hook. In fact, it does precisely the opposite. Because our measures are flawed, because the questions of what counts and why are shaped socially, culturally, and historically, and because there is no divine Law of Assessment that says, “Thou shalt have criterion-based tests,” we have to work very hard to say, “This is what counts, and here is why it counts, and here is how I know it counts, and here is evidence of it counting.” This forces us to make arguments, to do the work of assessment, and not kick back and rely on the unquestioned wisdom of the ancients. Getting to say what counts and why is a profoundly powerful experience. It is an inherently contested conversation. It will always produce disagreement. But these disagreements are good. They are not outside the issue of assessment. They are the issue of assessment.

No comments: