What We Give Up When We Let AI Decide

Interactive dilemmas about AI and grading

For forty years, machine learning in education automated logistics: scanning forms, checking syntax, tracking submissions.

Generative AI does something different. It automates judgment.

That shift changes what it means to teach, to assess, to be assessed. These scenarios explore the trade-offs—not to give answers, but to surface the questions we should be asking.

6 scenarios available. Each visit is randomized.

Based on "What We Give Up When We Let AI Decide" by Marc Watkins
Scenario: The Midnight Grader

11:47 PM, Tuesday

Forty-three ungraded essays sit in your inbox. Your department expects a manuscript draft by Friday. Your partner went to bed two hours ago.

Yesterday, a colleague mentioned they've been using AI to grade student writing. "It's like having a TA who never sleeps," they said. "I got ten hours of my week back."

You've always believed that grading is where real teaching happens—in the margins, in the questions, in the careful attention to a student's developing voice.

But that belief doesn't grade essays. And right now, you're drowning.

What do you do?

Scenario: The Midnight Grader
Path: The Experiment

You upload the essays. Within seconds, grades appear alongside detailed feedback: thesis clarity, evidence usage, paragraph structure.

You spot-check a few. The assessments are... reasonable. Not exactly what you'd write, but defensible.

Time saved: 4 hours, 23 minutes
Average time per essay: 6 seconds

Three weeks later, a student visits office hours. "I don't understand this feedback," she says, pointing to a comment. "What does 'consider strengthening the logical throughline' even mean?"

You don't remember writing that. Because you didn't.

How do you respond?

Scenario: The Midnight Grader
Path: The Holdout

It takes until 2:30 AM. You write comments about Sarah's emerging analytical voice. You note where Marcus's argument loses its thread. You catch a citation error that would have cost Jamie points later.

At the department meeting Friday, your chair shares a report: faculty using AI grading tools report 40% more time for research. Publications are up.

Your colleague's publications this year: 3
Your publications this year: 1
Tenure review: 14 months away

A colleague leans over. "The students can't tell the difference anyway."

Questions to sit with

  • If students can't tell the difference, does it matter who wrote the feedback?
  • Is time spent grading valued by the institutions that employ us?
  • What happens to educators who choose the slower path when the system rewards speed?
  • Can you mentor students if you don't have a job?
Scenario: The Midnight Grader
Path: The Disclosure

Her expression shifts. "So... a computer graded my essay?"

You spend twenty minutes discussing her argument together. It's a good conversation—better, maybe, than margin comments would have been.

That night, you see a post in a student forum: "Apparently Prof. [You] is using ChatGPT to grade papers? We're paying $60k a year for AI feedback?"

Forty-seven upvotes. Your chair wants to meet tomorrow.

Questions to sit with

  • Is transparency about AI use a professional obligation or a professional risk?
  • What is the student actually paying for—your time, your judgment, or something else?
  • Should institutions require disclosure of AI use in grading?
  • Who should write the policies—and how fast can they keep up?
Scenario: The Midnight Grader
Path: The Performance

You pull up her essay and improvise an explanation. It works. She nods, thanks you.

Walking back to your office, you feel the weight of a small deception. The feedback wasn't wrong. But it wasn't yours. And now you've claimed it.

A thought experiment:
If a student uses AI to write an essay and claims it as their own, we call it academic dishonesty. What do we call it when the grader does something similar?

Questions to sit with

  • Is there a meaningful difference between "AI-assisted" and "AI-generated" feedback?
  • What implicit promises does grading carry?
  • If the feedback helps the student, does it matter who wrote it?
  • What happens to trust when it's mediated by algorithms?
Scenario: The $15 Exam

A colleague's experiment

Your colleague Panos just ran an experiment. He used an AI voice agent to proctor oral exams for his 87 students, then had a "council of LLMs" grade their responses.

Total cost: $15.

He shares the results at a faculty lunch:

Student preferences:
13% preferred the AI oral format
57% wanted traditional written exams
83% found it more stressful
But: 70% agreed it tested their actual understanding

He turns to you. "You should try it. It's incredibly efficient, and honestly? It might assess understanding better than our written exams."

What's your reaction?

Scenario: The $15 Exam
Path: The Pragmatist

Panos walks you through the setup. Students schedule a 15-minute slot. The AI asks questions, follows up on vague answers, probes understanding. Three different models grade the transcript independently.

"The consistency is remarkable," he says. "No tired grader giving harsher scores at midnight. No unconscious bias about who 'sounds smart.'"

You think about your own grading—how your standards drift across a stack of 50 exams. How a strong essay early can make the next one seem weaker by comparison.

Then you think about what it means to be examined by a machine. The student speaking into a void, judged by something that doesn't know them, can't see their nervousness, can't offer a reassuring nod.

What matters more?

Scenario: The $15 Exam
Path: The Skeptic

Panos shrugs. "Traditional exams are stressful too. At least this tests what they actually know, not how well they write under time pressure."

But you're thinking about your students with anxiety. The ones who freeze when speaking. The ones for whom a human examiner's patience makes the difference between demonstrating knowledge and going blank.

"Did you offer accommodations?" you ask.

"The AI doesn't judge their affect," he replies. "In some ways, that's its own accommodation."

You're not sure if that's liberating or chilling.

Questions to sit with

  • Is "not judging affect" a feature or a loss?
  • Who decides what counts as understanding—the AI, the instructor, or the student?
  • If 70% say it tested understanding better, does that outweigh 83% finding it more stressful?
  • What are we optimizing for when we design assessments?
Scenario: The $15 Exam
Path: The Fairness Argument

You decide to pilot AI oral exams in one section. The consistency is real—scores cluster tightly, feedback is uniform, no student can claim unfair treatment.

But then a student comes to your office hours. She's crying.

"The AI kept asking me to clarify," she says. "I knew the answer, but I couldn't explain it the way it wanted. With a real person, I could have drawn a diagram. I could have seen if they understood."

Her score: C+. You would have given her a B+.

Questions to sit with

  • Is consistency the same as fairness?
  • What forms of understanding does AI assessment fail to capture?
  • When the machine and the human disagree about a student's knowledge, who's right?
  • Should we design assessments around what AI can evaluate?
Scenario: The $15 Exam
Path: The Relationship

You think about your own education. The professor who noticed you were struggling and asked a gentler follow-up question. The examiner who smiled when you finally found the right words.

Assessment has never been just about measuring knowledge. It's about being seen—having someone recognize your thinking, even when it's imperfect.

Can an algorithm do that? Does it need to?

"What if the point of oral exams isn't just the grade?" you say to Panos. "What if it's the last time some of these students ever have a real intellectual conversation with an expert in the field?"

He pauses. "That's a $15,000 conversation, if you cost out faculty time."

Questions to sit with

  • What do students lose when they're never examined by a person who knows them?
  • Can we put a dollar value on intellectual mentorship?
  • If we optimize for efficiency, what unmeasurable things disappear?
  • Is the human element a luxury or a necessity?
Scenario: The Policy Vacuum

The committee meeting

You're serving on the academic integrity committee when someone raises a troubling point.

"We have seventeen pages of policy about students using AI. But nothing about faculty using AI to grade."

Silence. Then the murmurs begin.

You learn that at least a dozen faculty are already using AI grading tools. Some departments have informal norms. Most have nothing. Students haven't been told.

Current policy coverage:
Student AI use: 17 pages
Data privacy concerns: 3 pages
Faculty use of AI in assessment: 0 pages

What position do you take?

Scenario: The Policy Vacuum
Path: The Regulator

You advocate for disclosure requirements. If faculty use AI in grading, students should know. The committee agrees to draft a policy.

But then the questions multiply:

"Does spell-check count?"
"What about Gradescope's OCR features?"
"If I ask ChatGPT to help me phrase feedback, is that AI grading?"

The line between "tool" and "automation of judgment" blurs. Every attempt at definition creates new edge cases.

Six months later, you have a 12-page draft that no one fully understands—and the technology has already changed twice.

Questions to sit with

  • Can policy ever keep pace with technological change?
  • Where exactly is the line between assistance and automation?
  • Who bears the cost when regulation lags—students, faculty, or both?
  • Is imperfect policy better than no policy?
Scenario: The Policy Vacuum
Path: The Libertarian

You argue for minimal regulation. Faculty have always chosen their own pedagogical methods. Why should this be different?

The committee moves on. No policy is written.

A year later, a student lawsuit makes the news. A graduate failed her comprehensive exams. She discovered the grading rubric was generated by AI, and one of the three "faculty graders" was actually a language model.

Her lawyer argues: "She was promised assessment by qualified experts. She received assessment by software. That's breach of contract."

Your institution's general counsel calls an emergency meeting.

Questions to sit with

  • What do students implicitly contract for when they enroll?
  • Is "faculty discretion" sufficient when the tools have changed this radically?
  • Who is liable when AI makes consequential errors in grading?
  • Should students have a right to human assessment?
Scenario: The Brilliant Outlier

The essay that breaks the rules

You've been using a hybrid approach: AI for initial feedback on mechanics and structure, your own comments for substantive engagement. It's working well. Mostly.

Then you review Kai's essay.

The AI flagged it harshly: "organizational inconsistency," "unconventional thesis placement," "informal register inappropriate for academic writing."

Suggested grade: C+

But when you read it yourself, something different happens. The essay is fragmented, yes. Risky. But also alive. Kai has done something you haven't seen all semester—genuine thinking on the page, wrestling with ideas in real time.

It might be the best thing you've read all year.

What do you do?

Scenario: The Brilliant Outlier
Path: The Override

You change the grade to A-. You write a personal note: "This took risks that paid off. Don't let anyone convince you to play it safe."

Kai comes to office hours, grateful. "I wasn't sure if that approach would work."

But later, you wonder: how many students did the AI grade down for taking risks you never saw? You only caught Kai's essay because something in the first paragraph made you read more closely.

The AI processes 50 essays in the time it takes you to really see one. What brilliance is being quietly, consistently filtered out?

Questions to sit with

  • Can AI recognize the essay that breaks rules brilliantly vs. the one that simply breaks rules?
  • If you only override grades you happen to review closely, is that fair to other students?
  • Does AI grading systematically disadvantage unconventional thinkers?
  • What happens to intellectual risk-taking when the first reader is an algorithm trained on conventional texts?
Scenario: The Brilliant Outlier
Path: The Standard

You keep the C+. The rubric exists for a reason. If you make exceptions for writing you personally like, you're introducing bias, not eliminating it.

Kai doesn't come to office hours. The next essay is competent, conventional, forgettable. It gets a B+.

You think about what you've taught: play it safe, hit the marks, don't take risks.

A colleague mentions that studies show AI grading systems tend to reward "safe" writing—clear structure, predictable moves, conventional voice. The creative outliers get flagged as errors.

A pattern emerges:
What AI can measure reliably: structure, mechanics, explicit claims
What AI struggles to measure: voice, risk, originality, intellectual courage

Questions to sit with

  • When we automate judgment, do we inevitably standardize what we value?
  • Is consistency more important than recognizing brilliance?
  • What message do our assessment systems send about what matters?
  • Who benefits from a system that rewards the predictable?
Scenario: The Junior Faculty

The mentor's advice

You're in your second year on the tenure track. Your mentor—a full professor who's been unfailingly supportive—takes you aside after a department meeting.

"I need to be honest with you," she says. "Your teaching evaluations are good, but your publication record needs work. The committee will notice."

She glances around, then lowers her voice.

"I've been using AI to grade for the past year. It's not perfect, but it's given me back ten hours a week. That's how I finished my book."

She pauses. "I'm telling you because I care about your career. The people who figure this out are going to have an advantage. I don't want you to be the one who falls behind."

What do you feel?

Scenario: The Junior Faculty
Path: The Realist

You take her advice. You start using AI grading. You do get more writing done.

But you notice something: the work feels different now. You used to think about your students constantly—their ideas, their struggles, their growth. Now you think about your manuscript.

At a conference, you run into a grad school friend who's still hand-grading everything. She looks exhausted. Her publication record is thin. She's worried about her contract renewal.

"I just can't bring myself to do it," she says. "Grading is how I really teach."

You don't tell her what you've been doing. You're not sure why.

Questions to sit with

  • If AI grading gives an advantage, is it fair to those who opt out on principle?
  • What happens to the profession when the "successful" faculty are those who automate judgment?
  • Is this an individual choice or a collective action problem?
  • Who has the job security to raise these questions publicly?
Scenario: The Junior Faculty
Path: The Conscientious Objector

You thank your mentor but don't take her advice. The next two years are hard. You publish less than your cohort. Your tenure case is borderline.

In the committee meeting, a colleague argues for you: "Her teaching is exceptional. Students cite her feedback as transformative."

Another colleague responds: "We're not hiring teachers. We're hiring researchers who also teach."

You get tenure—barely. But you watch junior colleagues who used AI grading sail through with stronger files. One of them confides: "I feel bad about it sometimes. But I couldn't have made it otherwise."

The structural question:
If the system rewards AI-assisted productivity and punishes careful human attention, what kind of faculty will we have in ten years?

Questions to sit with

  • Can individual resistance matter when the incentives all point the other direction?
  • Should tenure committees consider how candidates produced their research time?
  • What happens to teaching-focused faculty in an AI-enabled research arms race?
  • Who bears the cost of maintaining human judgment in assessment?
Scenario: The Discovery

The evidence

A student named Jordan asks to see you. They seem nervous.

"I need to show you something," they say, pulling out a folder. Inside are printouts of feedback from three different professors, all in your department.

"I ran these through a detector. They all have the same patterns—same sentence structures, same phrases, same rhythm. I'm pretty sure they were written by AI."

You look at the papers. Jordan is right. The feedback is good—but it's unmistakably uniform in ways human writing isn't.

"Are we being graded by robots?" Jordan asks. "Because we're paying a lot of money to learn from people."

How do you respond?

Scenario: The Discovery
Path: The Ally

You take Jordan's concerns to your chair. The response is defensive.

"Faculty are allowed to use tools to assist their work. This isn't cheating—it's efficiency."

"But the students don't know," you say.

"Do they need to? The feedback is accurate. The grades are appropriate. What's the harm?"

You think about Jordan, who spent hours analyzing their feedback, trying to learn from what they thought was a professor's expert insight. The harm is hard to name but easy to feel.

Jordan's petition gains 200 signatures. The student paper picks it up. The administration promises to "study the issue."

Questions to sit with

  • Do students have a right to know how they're being assessed?
  • Is accurate feedback from AI equivalent to accurate feedback from a professor?
  • What is the implicit contract between student and instructor?
  • If the harm is hard to measure, does it not exist?
Scenario: The Discovery
Path: The Pragmatist

"Did the feedback help you?" you ask.

Jordan pauses. "I mean... yeah. I guess. But that's not the point."

"Isn't it? If you learned something, if you improved, does it matter whether a person or a program wrote the comment?"

Jordan looks at you for a long moment. "When I get feedback from a professor, I think: someone who knows this field, who's read thousands of essays, who's thought deeply about writing—that person noticed this about my work. That means something."

They gather their papers. "If it's just an algorithm, it means nothing at all."

Questions to sit with

  • Is there value in being seen by another mind—even if the output is the same?
  • Does the source of feedback change its meaning?
  • What is the relationship between attention and recognition?
  • If students can't tell the difference, should we tell them anyway?

The Shift We're Living Through

Forty years of machine learning in education automated logistics: scanning forms, checking syntax, tracking submissions.

Generative AI automates judgment.

That shift changes what it means to teach, to assess, to be assessed. And most institutions haven't begun to grapple with it.

The supposed gains are many: time saved, consistency improved, feedback accelerated. But the losses may be harder to measure. The professor who noticed a student's voice emerging. The comment that asked exactly the right question. The human recognition that said: I see you. I'm paying attention. Your thinking matters to me.

These things don't scale. Perhaps that's the point.

Questions Worth Carrying

  • What do students lose when the person evaluating their work isn't a person?
  • What do educators lose when judgment is outsourced to algorithms?
  • Who benefits from framing this as individual choice rather than collective action?
  • What would it take to build institutions that value human attention?
  • If we can automate judgment, should we? And who gets to decide?

There are no right answers here. Only trade-offs we should make with open eyes.

Based on "What We Give Up When We Let AI Decide" by Marc Watkins