Knowledge
How to Design AI Tasks That Require Student Judgement
AI can give students answers quickly. That is useful, but it is not enough.
If AI-supported learning only helps students produce better answers, the task may look stronger while the student’s judgement remains underdeveloped. The more important question is whether students are learning how to evaluate what AI gives them. Can they compare it with what they already know? Can they question its assumptions? Can they verify its accuracy? Can they adapt it for purpose and audience? Can they decide what to accept, reject, revise or take responsibility for?
To design AI tasks that require student judgement, teachers need to make students compare, question, verify, adapt and decide rather than simply accept or improve AI-generated answers.
This is a central part of Visible Agency. When AI is involved in the learning process, student agency cannot be assumed from the final product alone. It has to become visible through the decisions students make along the way. Judgement is one of the clearest forms of that evidence. This matters because generative AI can both support and complicate learner agency. Roe and Perkins (2024) found that GenAI may enhance agency through personalisation and support, while also raising concerns around learner autonomy, equitable access and changing notions of agency.
The goal is not for students to get better answers from AI. The goal is for students to become better judges of answers.
That shift matters because AI can produce fluent, confident and useful responses even when the student has not yet done much thinking. If the task asks only for a final product, students may learn to become better users of AI without becoming more discerning learners. If the task requires judgement, students have to stay intellectually active. They must decide what is useful, what is limited, what is accurate, what is relevant and what they are prepared to stand behind.
This is where AI can become educationally powerful. It can give students more material to think with, but the task must require them to think.
Why judgement is the new cognitive demand
In traditional classroom tasks, students often had to generate much of the content themselves. They had to search for information, organise ideas, draft responses, create examples and develop explanations. Those tasks still required judgement, but the act of production often carried much of the visible cognitive effort.
AI changes that balance.
When students can generate a paragraph, summary, argument, image, explanation, code sample or set of ideas in seconds, the cognitive demand shifts. The challenge is no longer only whether students can produce something. The challenge is whether they can evaluate what has been produced.
Is it accurate? Is it relevant? Is it complete? Is it appropriate for the audience? Does it match the criteria? Does it oversimplify the concept? Does it sound convincing without being well supported? Does it represent what the student actually understands?
These are judgement questions. They are not secondary to the learning. In AI-supported learning, they are often the learning.
In the language of revised Bloom’s taxonomy, this work draws on the upper levels of cognitive process. Students analyse when they compare AI output with their own thinking, task criteria or trusted sources. They evaluate when they judge usefulness, accuracy, relevance and quality. They create when they adapt, revise or produce a stronger final response. The revised taxonomy also identifies metacognitive knowledge as a distinct knowledge dimension, which matters here because students are not only evaluating AI output; they are learning to notice how they evaluated it (Anderson & Krathwohl, 2001; Krathwohl, 2002).
The deeper opportunity is metacognitive.
Students are not only judging AI output. They are learning to notice how they judge, why they trust, what they question and when they need to verify before deciding. This is consistent with metacognition and self-regulated learning research, which emphasises students learning to plan, monitor and evaluate their own learning rather than simply complete tasks (Education Endowment Foundation, 2025).
Students evaluate when they judge the quality of an AI response. They become metacognitive when they can explain how they made that judgement.
This connects directly to Visible Agency: How to Design AI-Supported Learning Without Outsourcing Student Thinking. The flagship idea is that AI should not make student thinking disappear. This article focuses on one practical way to keep that thinking visible: design the task so students must judge before they use.
Judgement also protects the learning from becoming passive. If the student’s role is simply to prompt, receive, copy and submit, the task has not required enough discernment. But if the student must compare, question, verify, adapt and decide, AI becomes part of a thinking process rather than a substitute for it.
The risk of passive acceptance
The most common risk in AI-supported learning is not that students use AI. The risk is that they accept AI output too quickly.
AI responses often sound confident. They may be well organised, grammatically fluent and plausible enough to feel correct. For students, especially those still developing background knowledge, this creates a problem. Fluency can be mistaken for accuracy. Clarity can be mistaken for depth. Completion can be mistaken for understanding.
Passive acceptance occurs when students treat AI output as the answer rather than as something to be examined. They may paste an AI response into their work, lightly edit the wording, or use AI suggestions without asking whether they are accurate, relevant or appropriate. The task is complete, but the student may not have practised much judgement.
This is not a character flaw in students. It is a design issue.
If the task can be completed by accepting AI’s first useful response, then the task has not made judgement necessary. Students usually respond to the structure of the task. If the task rewards polished completion, they will optimise for polished completion. If the task rewards discernment, they will need to practise discernment.
The better question is not, “How do we stop students from using AI?” It is, “How do we design the task so AI output has to be interrogated before it can be used?”
This is why AI literacy cannot be reduced to prompt writing. Foundational AI literacy includes the capacity to understand, use and critically evaluate AI technologies, including their limits and ethical implications (Long & Magerko, 2020). More recent work on generative AI literacy makes a similar case, arguing that students need the capacity to evaluate generative AI critically and use it effectively, ethically and responsibly (Zhang & Magerko, 2025).
The concern is not only accuracy. It is also cognitive offloading: when AI does too much of the thinking, students may lose opportunities to monitor, test and regulate their own judgement. Viberg et al. (2026) identify cognitive offloading, transparency and human oversight as central dilemmas for protecting and promoting human agency in AI-supported education.
This is where the connection to How to Stop AI From Replacing Student Thinking becomes important. Passive acceptance allows AI to take over too much of the cognitive work. Judgement brings the student back into the centre of the process.
Five judgement moves students need
Judgement can sound abstract, but in task design it can become very practical. Students need repeated opportunities to practise five judgement moves: compare, question, verify, adapt and decide.
These moves help students work with AI output without surrendering their thinking to it. They also give teachers visible evidence of the student’s reasoning.
The important point is that each move has two layers. There is the cognitive action students perform, and there is the metacognitive awareness students develop as they notice how they performed it.
1. Compare
Students practise judgement when they compare AI output with something else.
They might compare an AI-generated explanation with their own first attempt. They might compare two AI responses to the same question. They might compare AI output with a rubric, source text, worked example, peer response, teacher model or class success criteria. They might compare a simple answer with a more complex one, or a general explanation with one written for a specific audience.
Comparison slows the process down. It asks students to notice differences rather than accept the first response that sounds reasonable. Cognitively, students are analysing. Metacognitively, they are asking: what am I noticing, and why does that difference matter?
A useful comparison task might ask:
- What does the AI response include that your first response missed?
- What does your response include that AI overlooked?
- Which version is clearer, and why?
- Which version better matches the task criteria?
- Which version would be more useful for this audience?
- Where does the AI response sound confident but remain vague?
Comparison makes thinking visible because students have to identify the basis for their preference. They are not simply saying, “This one is better.” They are explaining what makes it better.
Students do not practise judgement by receiving better answers. They practise judgement by deciding what makes an answer better.
This is where How to Make Student Thinking Visible When AI Is Part of the Process becomes a natural next step. Judgement should leave traces. Students need to show what they compared and why the comparison changed, confirmed or challenged their thinking.
2. Question
Students practise judgement when they question AI output rather than treating it as neutral or complete.
Questioning means looking for assumptions, omissions, weaknesses, bias, overgeneralisation, unsupported claims or misleading confidence. It asks students to move from receiving an answer to interrogating an answer.
A student might ask:
- What assumption is this response making?
- What has been left out?
- What does this answer make sound simpler than it really is?
- What would someone with a different perspective challenge?
- What evidence would this answer need before I could trust it?
- Where might this answer be too general for our context?
Cognitively, students are analysing the answer’s structure, limits and assumptions. Metacognitively, they are asking: what made me suspicious, uncertain or curious?
That is important. We are not simply asking students to find faults. We are helping them become aware of the cues that tell them an answer needs closer examination. That is a form of intellectual discipline.
Questioning is especially important because AI can produce responses that sound authoritative even when they are incomplete or inaccurate. Students need to learn that a fluent response is not the same as a trustworthy response.
Questioning also develops intellectual independence. It teaches students that useful support does not remove the need for careful thought. AI may give them a starting point, but it should not get the final word without challenge.
3. Verify
Students practise judgement when they verify AI-supported work against evidence, criteria or trusted sources.
Verification is more than checking whether something “sounds right”. It asks students to test claims against something outside the AI response itself. That might be a source document, textbook, experiment result, dataset, assessment criteria, teacher explanation, approved website, class notes or expert model.
Verification matters because AI-generated work can contain errors, invented details, weak evidence or oversimplified explanations. It can also give students answers that are broadly plausible but not suitable for the specific task.
A verification task might ask students to:
- highlight three claims that need evidence;
- check one AI explanation against a class source;
- identify which part of the response is unsupported;
- confirm whether the examples are accurate;
- mark any statements that require further investigation;
- revise the answer after checking it against the task criteria.
Cognitively, verification asks students to evaluate accuracy, evidence and reliability. Metacognitively, it asks them to notice what they knew enough to check, and what they needed help or evidence to confirm.
This is where students learn a crucial habit: confidence is not evidence. A response can sound polished and still need verification. An answer can be useful and still be incomplete. A suggestion can be helpful and still require responsibility.
Verification gives students an important message: AI can assist, but it does not remove responsibility. If students use an AI-supported answer, they must know how to check the parts that matter.
4. Adapt
Students practise judgement when they adapt AI output for purpose, audience, accuracy, context or meaning.
Adaptation is different from cosmetic editing. It is not just making the response sound better. It is changing the response so it becomes more appropriate, precise or useful for the learning purpose.
Students might adapt an AI-generated explanation so it suits younger learners. They might revise a generic response so it connects to a local example. They might change the tone of an argument for a particular audience. They might add missing evidence, remove unsupported claims, simplify unnecessary language or make the reasoning more explicit.
A strong adaptation task might ask:
- What needs to change so this response suits the audience?
- What needs to be added for accuracy?
- What needs to be removed because it is vague or unsupported?
- How would you adapt this for a different purpose?
- What part of the AI response is useful but not yet appropriate?
Cognitively, adaptation moves students towards creating because they are shaping something new from the material they have evaluated. Metacognitively, they are asking: why am I changing this, and what does that change improve?
Adaptation helps students see that AI output is not finished work. It is material for thinking. The student’s role is to shape that material into something accurate, purposeful and responsible.
5. Decide
Students practise judgement when they make a decision and take responsibility for it.
This is the move that completes the process. After comparing, questioning, verifying and adapting, students still need to decide what they will accept, reject, revise or ignore. They need to explain why.
Decision-making is where judgement becomes ownership.
A student might decide that an AI suggestion is useful but incomplete. They might decide that one explanation is clearer but another is more accurate. They might decide that AI is helpful for generating examples but not for forming the final argument. They might decide not to use AI for a particular part of the task because the learning purpose is to practise independent recall, original interpretation or personal reflection.
Cognitively, students are evaluating and justifying. Metacognitively, they are asking: what am I prepared to stand behind?
The decision matters because it places responsibility back with the learner. They are not submitting AI’s thinking. They are submitting their own decision about what to do with AI-supported material.
This connects directly to How to Design AI-Rich Tasks That Still Require Student Ownership. Ownership is not proved by refusing support. It is shown when students can explain and stand behind the decisions they made while using support.
How to build judgement into AI-supported tasks
A task requires judgement when students cannot complete it by copying, lightly editing or submitting AI output. They have to evaluate and justify what they do with it.
That does not mean every AI-supported task needs to become complex. It means the task needs to include a clear moment where students must make thinking visible. The judgement should be built into the learning design, not added as a decorative reflection after the work is already finished.
This is also an assessment-design issue. Zaphir et al. (2024) argue that educators need ways to examine how vulnerable assessment questions are to generative AI and to redesign tasks around the critical thinking students are expected to demonstrate. In the same spirit, Ding and Magerko (2025) argue that educational AI evaluation needs to move beyond technical performance and output quality to include learner agency, context, ethics, explainability and human-centred outcomes.
There are several practical task structures that can help.
AI comparison tasks
In an AI comparison task, students compare AI output with another version, source, model or set of criteria.
For example, students might write their own explanation of a concept before asking AI for a second explanation. They then compare the two versions and identify where their own explanation was stronger, where AI was clearer and what they would change in their final version.
The learning is not in asking AI for an explanation. The learning is in comparing explanations and deciding what makes one stronger than another.
AI critique tasks
In an AI critique task, students examine an AI response for strengths, weaknesses, omissions or inaccuracies.
For example, students might ask AI to produce a persuasive argument, then critique the quality of the reasoning, evidence and audience awareness. They could identify unsupported claims, weak transitions, vague language or assumptions that need to be challenged.
The value of this task is that students are not positioned as passive receivers. They are positioned as evaluators.
AI verification tasks
In an AI verification task, students check AI output against evidence.
For example, students might ask AI to summarise a historical event, then verify the summary using approved sources. They could mark which claims are confirmed, which are incomplete and which need correction.
This kind of task is especially useful when students are learning research habits. It makes verification part of the process rather than a teacher warning given after the fact.
AI revision tasks
In an AI revision task, students use AI feedback or suggestions to improve work, but they must decide which suggestions to accept.
For example, students might draft a paragraph, ask AI for feedback on clarity and structure, then choose two suggestions to apply and one to reject. They must explain each decision.
This teaches students that feedback, whether from AI, peers or teachers, is not something to follow blindly. It is something to judge.
AI limitation tasks
In an AI limitation task, students identify where AI is not sufficient.
For example, students might ask AI to respond to a local community issue, then identify where the response lacks local knowledge, cultural understanding, emotional nuance or direct evidence. They then explain what human judgement or contextual knowledge would be needed.
This kind of task is important because students need to learn that AI can be useful and limited at the same time.
AI decision tasks
In an AI decision task, students decide whether, where or how AI should be used.
For example, students might be given several parts of a project and asked to decide where AI could be helpful, where it could interfere with the learning goal and where human judgement should lead. They must justify their choices.
This moves AI use from habit to intention.
The Visible Agency Design Test can help teachers review whether these tasks genuinely require judgement or only appear to. If the task does not require students to compare, question, verify, adapt or decide, it may not yet make judgement visible enough.
How to help students know when AI is not the right tool
One of the most important forms of judgement is knowing when not to use AI.
This is easy to overlook. Many classroom conversations focus on how students can use AI effectively, ethically or efficiently. Those conversations matter, but they are incomplete. Students also need to understand when AI may be unhelpful, inappropriate or counterproductive for the learning purpose.
AI may not be the right tool when the purpose is to practise recall. It may not be the right tool when students need to wrestle with their own first ideas before receiving support. It may not be the right tool when the task requires personal experience, ethical reflection, cultural knowledge, emotional nuance or original interpretation. It may not be the right tool when using it would remove the productive struggle the task was designed to create.
This does not mean AI should be excluded from all of these moments. It means students need to understand the purpose of the learning before choosing the tool.
One sign of student agency is not simply using AI well. It is knowing when not to use it.
Teachers can help students develop this judgement by asking tool-choice questions before the task begins:
- What part of this task should you attempt without AI first?
- Where might AI be useful as support?
- Where might AI interfere with the learning purpose?
- What decision should remain yours?
- What evidence would show that you still understand the work?
- When would using AI make the task easier but less valuable?
These questions help students see AI as a tool to be chosen intentionally, not a default response to difficulty. They also connect AI use to ownership. If students can explain when and why they used AI, they are more likely to remain responsible for the learning.
Classroom prompts that require discernment
The easiest way to strengthen judgement is to build discernment into the prompt or task instructions. Instead of asking students to generate an answer with AI, ask them to do something with the answer that requires thinking.
The following prompts can be adapted across year levels and subject areas.
Compare prompts
Use these when students need to examine differences between versions, sources or explanations.
- Compare your first response with the AI response. What did AI include that you missed, and what did your response include that AI missed?
- Generate two AI explanations of the same concept. Which is stronger, and what makes it stronger?
- Compare the AI response with the success criteria. Where does it meet the criteria, and where does it fall short?
- Compare an AI-generated example with a class example. Which one better shows the concept?
- Compare AI’s answer with your current understanding. What changed, and what stayed the same?
Question prompts
Use these when students need to interrogate assumptions, omissions or weaknesses.
- What assumptions does this AI response make?
- What important idea, perspective or evidence is missing?
- Where does this response sound confident but need more support?
- What question would you ask before trusting this answer?
- What might someone disagree with, and why?
Verify prompts
Use these when students need to test accuracy or reliability.
- Identify three claims in the AI response that need to be checked.
- Verify the AI response against an approved source. What is accurate, incomplete or incorrect?
- What evidence supports this response?
- Which part of the answer cannot be trusted until it is checked?
- What would you need to confirm before using this in your final work?
Adapt prompts
Use these when students need to revise for purpose, audience or context.
- Adapt this AI response for a younger audience without losing accuracy.
- Revise the response so it better matches the task criteria.
- Change the explanation so it uses evidence from our class sources.
- Make the response more specific to our local context.
- Remove anything vague, unsupported or unnecessary.
Decide prompts
Use these when students need to make and justify a choice.
- What part of the AI response will you accept, and why?
- What part will you reject or change, and why?
- Which AI suggestion improved your work most, and what decision did you make because of it?
- What will you take responsibility for in the final version?
- Where did your judgement matter most in this task?
When not to use AI prompts
Use these when students need to think about appropriate use.
- Which part of this task should you do without AI first?
- Where would AI support your learning, and where might it reduce your learning?
- What would you lose if AI did this part for you?
- When would AI make the task easier but less useful?
- What decision needs to remain fully yours?
These prompts should not become a worksheet for every task. They are design options. The teacher’s role is to choose the prompt that matches the learning demand.
If the goal is accuracy, use verification. If the goal is discernment, use comparison and questioning. If the goal is communication, use adaptation. If the goal is ownership, use decision-making. If the goal is agency, ask students when AI is and is not the right tool.
What teachers should look for
When students are practising judgement, teachers should be able to see more than the final product. They should be able to see the student’s reasoning.
Useful evidence might include a comparison between the student’s first idea and AI’s response, annotations showing what the student questioned, notes showing which claims were verified, a revision explanation showing what changed and why, a short justification for accepting or rejecting AI suggestions, a reflection on when AI was useful or limited, or an explanation of why the student chose not to use AI for part of the task.
This evidence does not need to be elaborate. In many cases, a few sentences are enough. The point is not to create more paperwork. The point is to make the student’s judgement visible enough to support better feedback and deeper learning.
A useful test is whether the evidence helps the learning conversation. Can the teacher see what the student noticed? Can the student explain what they decided? Can both teacher and student identify how the work improved because of judgement rather than because of passive AI use?
If so, the task is doing more than producing an answer. It is developing a learner.
Where to go next
This article is part of the Visible Agency series.
-
For the broader framework, read Visible Agency: How to Design AI-Supported Learning Without Outsourcing Student Thinking.
-
For the evidence side of the work, read How to Make Student Thinking Visible When AI Is Part of the Process.
-
For the reflective side of AI-supported learning, read How to Use AI to Strengthen Metacognition.
-
For the ownership side of the work, read How to Design AI-Rich Tasks That Still Require Student Ownership.
-
To review a task before using it with students, use The Visible Agency Design Test.
Frequently asked questions
Final thought
AI should not simply give students more answers. It should create better opportunities for students to practise judgement.
When students compare, they learn to notice quality. When they question, they learn to recognise assumptions and gaps. When they verify, they learn that confidence is not evidence. When they adapt, they learn to shape ideas for purpose and context. When they decide, they learn to take responsibility.
This is how AI-supported learning can strengthen agency rather than weaken it.
The aim is not for students to become dependent on AI for better responses. The aim is for students to become more discerning, more responsible and more capable because of the thinking AI required them to do.
The question for teachers is not only, “Can students use AI for this task?”
The better question is, “What judgement will this task require students to practise?”
If the task requires students to compare, question, verify, adapt and decide, AI can become more than a shortcut. It can become a powerful surface for thinking.
That is where student judgement becomes visible.
References
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.
Ding, S., & Magerko, B. (2025). Rethinking AI evaluation in education: The TEACH-AI framework and benchmark for generative AI assistants. arXiv. https://arxiv.org/abs/2512.04107
Education Endowment Foundation. (2025). Metacognition and self-regulated learning (2nd ed.).https://educationendowmentfoundation.org.uk/education-evidence/guidance-reports/metacognition
Krathwohl, D. R. (2002). A revision of Bloom’s taxonomy: An overview. Theory Into Practice, 41(4), 212–218.https://doi.org/10.1207/s15430421tip4104_2
Long, D., & Magerko, B. (2020). What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery.https://doi.org/10.1145/3313831.3376727
Roe, J., & Perkins, M. (2024). Generative AI and agency in education: A critical scoping review and thematic analysis. arXiv. https://arxiv.org/abs/2411.00631
Viberg, O., Cukurova, M., Kizilcec, R. F., Buckingham Shum, S., Demszky, D., Gašević, D., Jansen, T., Jivet, I., Jovanovic, J., Meyer, J., Murayama, K., Pardos, Z., Piech, C., Rummel, N., & Winstone, N. E. (2026). Protecting and promoting human agency in education in the age of artificial intelligence. arXiv. https://arxiv.org/abs/2602.20014
Zaphir, L., Lodge, J. M., Lisec, J., McGrath, D., & Khosravi, H. (2024). How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence. arXiv. https://arxiv.org/abs/2406.14769
Zhang, C., & Magerko, B. (2025). Generative AI literacy: A comprehensive framework for literacy and responsible use. arXiv. https://arxiv.org/abs/2504.19038
