Can exams be fair with the broad availability of ChatGPT?

Teachers now face the extra challenge of designing exams that will prevent students from cheating their way to good grades with ChatGPT.

By Idun Haugan - Published 01.06.2023

Denne saken er også tilgjengelig på norsk

Thousands of students have been taking exams to show what they’ve learned during the semester. For some, the ChatGPT language robot may be tempting to use.

The language robot, which is based on artificial intelligence (AI), can answer questions and deliver ready-made texts and content on a wide range of topics. It has caused debate in large parts of the world in recent months, including on the topic of conducting exams and the risk of cheating.

Produces false and inconsistent responses

Benjamin Kille, who conducts research on artificial intelligence at NTNU, sees both challenges and opportunities with language robots.

“Google, OpenAI and Microsoft have now produced such advanced language models that they can deliver texts that are difficult to distinguish from texts created by humans,” Kille says.

Benjamin Kille studies artificial intelligence at NTNU. Photo: NTNU

“It’s unclear what text is used to feed OpenAI’s ChatGPT. We assume that it is text found online, which implies that it could include teaching material. That enables ChatGPT to answer a number of exam questions,” says Kille.

However, he also says that experts have tested exam questions on ChatGPT, and they have found that it produces false and inconsistent answers.

“So students can’t yet rely on ChatGPT to get good exam grades,” he says.

Furthermore, he notes that students can use ChatGPT as a tool, for example to get started with writing their responses.

Artificial intelligence mimics the brain’s own network

A machine that can solve problems it has encountered before uses narrow AI (narrow artificial intelligence). A machine that can solve problems it has not yet encountered uses general AI (general artificial intelligence).

“So far we’re using narrow AI; we haven’t yet developed general AI to any great extent,” says Kille.

The artificial intelligence models under development mainly use machine learning to enable them to solve tasks. Machine learning is a specialization within artificial intelligence where statistical methods are used to allow computers to find patterns in large amounts of data.

This means that the machine “learns” instead of being programmed.

“Machine learning uses artificial neural networks similar to the ones we have in our brains, and these language models have trillions of network connections,” says Kille.

Completely irresponsible

Inga Strümke, an associate professor and researcher at NTNU who specializes in artificial intelligence, recently published a book called Maskiner som tenker (Machines that think). Strumke recently talked to the Norwegian business newspaper Dagens Næringsliv about AI tools like ChatGPT.

Inga Strümke is an associate professor at NTNU whose recent book on artificial intelligence has become a best-seller in Norway. Photo: NTNU

“The technology is really powerful. If used correctly, and within reason, it can be extremely useful. If used incorrectly, it can be really harmful. The most important thing about the launch of ChatGPT was that everyone’s eyes were opened to the fact that AI is now part of our lives,” she said in the article.

Introducing ChatGPT “could have been done more elegantly. It was completely irresponsible to make ChatGPT available without giving the education sector – and many others – a chance to encounter this revolutionary force.”

Tips on how exam tasks can be designed

In order to meet the challenge associated with exam tasks, a working group at NTNU has developed tips and advice on how to create exams that ChatGPT cannot help solve to any great extent.

Rasmus Grønbæk Jensen, an adviser at NTNU’s examination office, participated in this project with people from the Section for Teaching and Learning Support.

Here are NTNU’s tips on how to create tasks that cannot be solved with the use of artificial intelligence models alone.

Writing exam questions in an AI world

Tasks that require good knowledge of the syllabus: Since chatbots probably are not familiar with all the syllabus literature, tasks that require in-depth knowledge of the syllabus will make it difficult to use AI in students’ answers. This applies especially to recent Scandinavian literature.

Vary the tasks: So far chatbots are best at working with text, so tasks that require using other formats, such as audio and video files, images or graphs make it difficult to use only chatbots to produce an answer.

Base tasks students’ own experiences and personal reasoning: By asking students to work within a specific context and situation, they can demonstrate their skills, knowledge and competence to a greater extent.

Use a case study: Make an unknown case the basis for answering the task.

Require complex, nuanced answers, for example by using specific or technical language and terminology.

Ask students to reflect on their own process and answers: The reflections can be related to data/source collection, structure of the answer/text, critical assessments of content, reasons for opting out of content, etc.

Knee jerk reaction

“When a new tool as powerful as Chat GPT comes along, the knee jerk reaction is to require exams at the university instead of home exams,” says Rasmus Grønbæk Jensen.

Universities typically have exam invigilators in rooms where exams are held, and the locale is typically equipped with the Safe Exam Browser, which prevents students from logging in online.

“But lengthy exams, which can only be offered as take-home exams, have their own advantages that a school-based exam can’t provide,” Jensen says.

“We need to build an understanding and a mindset in students that encourages them to use what they’ve learned. It’s important to create exams that both motivate and require students to demonstrate what they’ve learned,” he said

Grønbæk Jensen says it can be both difficult and easy to make these kinds of exams.

Rasmus Grønbæk Jensen says it’s both easy and hard to make exams that make it difficult to craft answers using ChatGPT. Photo: NTNU

“Google has been around for a long time, and students can get information online on take-home exams. They’re also able to collaborate with fellow students or get help from others.”

But text that has been copied from websites and contains elements taken from Google constitutes plagiarism, which can be detected by a plagiarism checker.

“By contrast, a plagiarism checker can’t detect the use of artificial intelligence. ChatGPT can create unique text for each request it receives, and it’s really difficult to prove that AI has been used in exam answers. But if 15-20 people ask ChatGPT about the same thing, the answers will be somewhat similar,” says Grønbæk Jensen.

“It’s important that we as a university assume that the students are here to learn and to develop; they’re not here to cheat,” he says. “We can’t treat our students like suspects.”

Has to feel meaningful for students

Martha Torgeirdatter Dahl is a university lecturer in pedagogy at NTNU. She says that educational institutions now have to decide whether content used in assessments – the tasks we create and the ways in which the students are given the opportunity to demonstrate their knowledge – are adapted to a world where ChatGPT is a reality.

Examinations where students are only asked to explain theory and regurgitate content will be eliminated. The same applies to tasks where the main emphasis is placed on structure and spelling.

Martha Torgeirdatter Dahl says that exams need to be written to challenge students to reflect on what they have learned, rather than just regurgitate facts. Photo: NTNU

“In light of the new text generator tools, it’s clear that we need content in our assessments that requires more from students and that text generator tools cannot process as well,” says Dahl.

“Student need to be given the opportunity, orally or in writing, to demonstrate independent reflection and assessment skills, and to actively apply the syllabus in order to demonstrate what they have learned. The tasks should be anchored within a context, by linking them to specific conditions and reflect current situations and personal experiences,” she says.

“But,” she said, “I think that we also need assessments that are meaningful for he student to spend time on.”

Important to dialogue with students

Dahl says it will be crucial to talk to students about the appropriate use of these tools and for teachers to decide how text generator tools might be used in a responsible way in their subject.

“Ongoing dialogue with the students about the overall purpose of the exam will also be important, to ensure that they see the value in their own ability to convey their academic reasoning and competence, both orally and in writing,” says Dahl.

Subscribe To Our Newsletter

You have Successfully Subscribed!