Most people would have to agree that what computers can do is pretty impressive. More importantly for those trying to save money, using computers to grade student essays in high-stakes testing situations is much cheaper than hiring and training human readers. Consequently, there is significant support for this practice.
But others oppose it.
HumanReaders.Org recently launched a petition “Against Machine Scoring Of Student Essays In High-Stakes Assessment.”
HumanReaders.Org argues that “computerized essay rating” should not be used for “any decision affecting a person’s life or livelihood” because it is “reductive,” “inaccurate,” and “unfair.” Along with their petition, HumanReaders.Org provides a summary of the relevant research and a substantial list of references.
What’s at stake in this debate?
Of course, there is the issue itself. The kinds of high-stakes tests that these computers score influence education in significant ways, impacting what and how K12 teachers teach, who gets to go to college, and what they will be able to do when they get there. So we want to get it right.
But the bigger picture has to do with how we think about and make decisions about education. The high stakes of the specific question of automated essay scoring illustrate the high stakes of getting those involved in education to read the scholarship on teaching and learning.
HumanReaders.Org gets this right. They ground their specific arguments in evidence from the research. Just as importantly, they approach the whole issue in a way that is clearly informed by the broader scholarship on teaching and learning, for instance, in knowing what questions to ask.
Anyone who wants to take up a contrary position should do the same.
Dr. Corrigan, I have been reading your blog posts for the last couple of days. I discovered you through the Chronicle piece by Mark Edmundson. Thank you for what you do here!
Thanks for the kind words! I’m so glad to hear that you’re finding what’s written on the blog worthwhile.
I appreciate the article but I would apply caution before buying into the claims made in the petition. Many of the items in their Works Cited are not actual research. Furthermore, very few of the items cited are from after the Hewlett Foundation’s 2012 competition that marked a turning point in the field of AES. Here’s an alternative view worth reading:
J, the phrase you use here “buying into” seems particularly apt for this situation. However, the petition I’m pointing to and the scholars behind it are not actually selling anything, whereas those promoting automated essay scoring for high stakes assessment often are–including you, incidentally, as the link to your website reveals. Of course, the fact of money being on the table for someone doesn’t automatically invalidate their points. But it does invite closer scrutiny of them. So far, it appears that independent research has not supported the sorts of claims made by companies selling AES software for high stakes testing, while research sponsored by those companies sometimes does.
Are you affiliated with the organization that put out that petition? The reason I ask, is that your response comes across as if you are. You say, “the petition I’m pointing to and the scholars behind it are not actually selling anything”. Their entirely livelihood comes from scoring responses. I don’t know how that could be interpreted as “not selling anything”. On the other hand, you are saying that I am selling AES for high stakes assessment. This too is incorrect. We offer a FREE automated proofreading tool. My experiences discussed in the blog article that I linked to reference my time as an AI researcher working for a company that uses AES for high-stakes testing. By the way, these same companies also make money from the human scoring, so I don’t quite follow your insinuation that companies with AI scoring technology are unscrupulously promoting their wares for financial benefit. The push for AES is not coming from the vendors, it’s coming from school systems and educational consortiums that feel multiple choice testing is woefully inadequate as a means of testing students. A good starting place for assessing this technology would be with Dr. Shermis’ study of the Hewlett Foundations ASAP competion.
I did misspeak. You are not selling software for high stakes assessment. But you are selling something related to this issue, no? Am I reading incorrectly that the free proofreading software is funded by advertising and that a premium version is available for purchase?
To repeat myself, that one is selling something does not mean that one is automatically wrong or unduly biased. I certainly am not meaning to insinuate that you are being unscrupulous. (We don’t know enough about each other either way for those kinds of claims.) My point is that when I see a “debate” in which companies are consistently supporting one version of “research” that contradicts the version of research supported by independent scholars, well, that establishes an important context for me for understanding the research.
The HumanReaders.org organization exists only in the petition itself. It is a collection of scholars and teachers and concerned citizens who came together specifically over this issue. As for myself, my only connection with it is as a proponent of the petition and of the way that they inform their position with theory and research on teaching and learning.
To suggest of those behind the petition that “Their entirely livelihood comes from scoring responses” is simply not correct. In fact, it strikes me as a bit odd of a thing to assert, since no scholars or teachers I know make their entire livelihood from scoring responses. And, while some teachers make a good portion of their livelihood grading and responding to student work (which involves way more then just scoring), they do so not because of the money involved (they are underpaid) but because they believe in its value. To put the negligible financial considerations that some teachers may have to oppose automated essay scoring on the same level with the overriding financial consideration that certain for-profit companies (by definition) do have to support it strikes me as a false equivalency.
I appreciate you citing Shermis. His most famous work on the topic is the essay, “Contrasting State-of-the-Art Automated Scoring of Essays,” which was highly touted in the media but not actually published. (Is this the work of his that you are referring to?) Les Perelman published a sound critique of this study in The Journal of Writing Assessment, pointing out multiple glaring problems with its methodology and showing how its conclusions are unsubstantiated: http://journalofwritingassessment.org/article.php?article=69
Shermis’ study has been criticized by some and lauded by others. I point to that as a starting point in understanding where this field is and where it is going. I get the feeling that you are viewing AES as some sort of static chemical which can be described through various views and analyses that have been gathered over the years. Imagine in 2001 if our understanding of genome sequencing came from 1995. Or if we decided what the capability of a computer was based on computers from even 2 years earlier without taking into account the rate of progress. Or if the abilities of autonomous driving had been assessed in 2004 when no vehicles could complete the DARPA Grand Challenge. The following year there were 5 finishers and the technology is now flourishing rapidly. Similarly, the Hewlett Foundation’s ASAP competition was a flashpoint in the industry in which innovations from the competition were infused into existing product lines. Research prior to this point is not going to be remotely accurate. Similarly, research of one organization’s AES does not apply 100% to another organization’s any more than the limitations of CMU’s autonomous vehicle apply to Stanford’s. I understand the difficulty of trying to get a grasp on the merits of this rapidly-changing technology as an outsider. As I stated in my blog post, the company I had worked for tracks several metrics related to accuracy, and in their first major rollout of AES, the computer readers did better than the human readers in every metric. That was just the initial rollout. This is only going to get better. Much better. So, yes, you are right in wanting more research, and I can assure you that it’s coming. If this is news to you, then consider me the messenger.