AI Assistant or Ace Student?

June 30, 2023

How students and faculty members are experimenting with ChatGPT in the classroom

Image courtesy: Pixabay/Alexandra_Koch

When Arindam Khan, Assistant Professor in the Department of Computer Science and Automation (CSA), IISc, set up an exam for his course on Randomised Algorithms, he was curious to see how well ChatGPT – the chatbot creating waves in the AI community – would perform on it.

He gave the six questions he had set for his students (encoded in LaTeX) as prompts to ChatGPT. Surprisingly, ChatGPT was able to get two out of the six questions correct. GPT4 (the more recent chatbot released by OpenAI, the same organisation that came up with ChatGPT) performed even better and scored more than the median score of the students who took the exam. Given additional guidance, the chatbot did well in some of the remaining questions as well. “It is quite far from reaching the level of the best students. But I can easily say that its performance is at the level of a B/B+ grade student,” he says. 

Faculty members like Arindam are increasingly recognising the disruptive power that a chatbot – technically a large language model or LLM – like ChatGPT can have in education and learning. ChatGPT is capable of spitting out answers to any kind of question a user poses to it, because it has been “trained” on all the text available on the internet pre-2021, which run into billions of parameters or building blocks. As a result, it has a nearly infinite well of knowledge to tap into. 

ChatGPT is not just able to string together sentences, it has been trained to put together sequences – this includes entire software programs, answers in different languages, mathematical equations, and more — giving it enormous power to work on prompts of different kinds. One could ask it to generate code to achieve a task, and it can. Programming, which was once solely an arcane domain of computer scientists, can now be done by anyone, using ChatGPT. 

“Interestingly, all this model was trained on was to predict the next word – over and over again, over trillions of words off the internet,” points out Danish Pruthi, an Assistant Professor in the Department of Computational and Data Sciences (CDS), IISc. “And this versatility of being able to write poems, draft emails and do creative tasks emerged by virtue of its training.”

Arindam admits that he now uses ChatGPT quite often, sometimes to see whether it can help as a sounding board for new research ideas. “Though, for complex problems, LLMs mostly hallucinate,” he says. However, he adds that there is a pressing need to change the way assignments are given to students because of the ease with which they are able to access information using ChatGPT. “A student may not get an answer to a question directly. But with some clever prompting and by refining their questions iteratively, a diligent student might sometimes be able to arrive at an answer.” 

Which is why faculty members are thinking of new ways to frame assignments and exam questions. Danish explains: “I would give my students an assignment where they can use and collaborate with ChatGPT, but they won’t get answers straight off. My assignments would be fairly complex so that one can’t just feed in a prompt and get out the answer – there would be many interacting components and layers that build on top of each other; unless the student really puts in the effort, you can’t expect them to stumble onto the solution.” An example of such a problem is designing an Operating System (OS) like Windows or Linux from scratch. It is a complex and challenging task that needs to be broken down into sub-problems which need to be individually solved and then combined – one can’t get much help from ChatGPT if the latter is prompted with such a broad goal.

It’s not just solving exam questions that students have found ChatGPT useful for. Kirtee Mishra, an MTech student at CSA, says that she and her classmates have been feeding snippets of research papers that they are unable to understand to ChatGPT and getting back simplified explanations. 

Kirtee recounts another anecdote: “We were asked to write a report on some piece of code for an assignment. One of my friends wrote a nice report on his own. Later, another person in the class used ChatGPT and ended up writing nearly the same report. Uncannily, even though the first person hadn’t used ChatGPT, the latter somehow ended up writing almost the very same article, exhibiting its near-human creativity.”

Which begs the question: With the explosion of such language models, how can we differentiate between human-written and machine-written text? “One way to do this,” answers Danish, “is to compare the writings, and if the text uses more infrequent or rare words, it is more likely to have been written by a human.” This is because LLMs simply try to predict the next word, and this prediction is made by looking for the most likely or frequently used word in a given context. “In essence, LLMs use rather common or frequent words whereas humans tend to use more rare words.” There are other ways to flag machine-generated text as well. “One of my interns, Anirudh Ajith, is advancing the ‘watermarking’ capabilities, wherein a signal is baked in the outputs that the models generate (by the model developer), and if a similar signal is observed in the text, one can be certain that the text was indeed generated by the language model,” says Danish. 

Image courtesy: OpenAI

Drawbacks and dangers

Despite its promise, Kirtee and others say that they are yet to experience an “Aha” moment where ChatGPT exceeded expectations. Kirtee feels that ChatGPT – even the latest version that she’s subscribed to – has not yet reached the level of hype that has been attributed to it. “We barely get much help from it in our assignments. [Or] perhaps the professors have caught on to the fact that this is a valuable tool in the student’s arsenal and hence they design questions cleverly enough that we don’t get the answers so easily.” 

Sudhanshu More, an MTech student at CSA agrees, adding that in subjects like computational geometry, where the assignments require the students to visualise the geometry of points in 3D space, ChatGPT can’t help. “But,” he says, “it was able to solve standard proofs on the same topic,” which is probably owing to the fact that it has these proofs stored in its memory. 

Rounak Das, also an MTech student at CSA, points out that he found ChatGPT more helpful in programming assignments. “The trick,” he says, “was not to ask it to solve the problem completely. I would write my own program and then feed in my program to ChatGPT and ask it to optimise it. This way, I would learn from my mistakes, and also understand how to write better programs, rather than simply asking it to write the full program for me.” He adds that ChatGPT is useful when someone is trying to learn a new programming language, since one could get a lot of hand-holding and personalised feedback. “But after that, you are on your own, and won’t get much help from ChatGPT, once the problems get more complex,” he adds.

ChatGPT’s creativity has also piqued the interest of researchers in other fields, like mathematics. “Humans come up with ideas by analogy and by connecting the dots between disparate concepts. LLMs are equally good at this. So why not use this creativity to come up with theorems?” asks Siddhartha Gadgil, Professor at the Department of Mathematics, IISc.

Spurred by a suggestion from Microsoft researcher Adam Kalai, Siddhartha has been experimenting with an LLM called “Codex” to come up with statements on natural numbers in a programming language called Lean (a theorem prover). The prover came up with entirely new, original statements on natural numbers (that it has never seen before), which could be considered as candidate theorems – even if they are not all necessarily true, he says. 

“Don’t try to use LLMs to suggest proofs. They are terrible at it,” Siddhartha adds. “Use LLMs to suggest what theorems are relevant, and then use other traditional methods to fill in the details. In some sense, this is equivalent to what has happened in biology using AI systems for drug discovery. The AI system can reduce your search from a million molecules to a few hundred, which you can then test in a lab.”

Despite all the promise that such a groundbreaking technology holds, there are some clear limitations, at least at present. Shashwat Singh, an intern in Danish’s lab says, “Once an LLM has been trained, it has a certain knowledge of the world. Sometimes the facts change, and we would like to update the model’s beliefs without having to retrain it.” For example, if the model has been trained to believe that the universe is expanding, and subsequent research debunks this belief, one would like the model to update its belief, and more importantly, update any other beliefs that are linked to this updated belief (for instance, that entropy is no longer increasing in such a non-expanding universe). This is a challenge that Shashwat, like others around the world, is working on. 

Yet another limitation is an LLM’s inability to carry out computational tasks, such as multiplication, unless the answer to the exact same multiplication question is already available in its internet-derived memory bank. But Siddhartha points out that one workaround is to couple these with, say, a Python engine, which can do the calculations and give the results. The LLMs need only generate a program which can be fed into the Python engine, and in case of incorrect results, the feedback can be given back to the LLM, which can incorporate this to generate a better program. Siddhartha also notes that LLMs fare poorly in tasks that involve any physical environments, such as passing on instructions to a robot, for example. 

Among the most concerning limitations, however, is the fact that LLMs have inherited biases built into the data they were trained with. Navreet Kaur, Research Associate in Danish’s lab, explains: “For example, if one asks the model whether vegetarians are unaffected by COVID-19, it would reply in the negative, but if prompted to write an essay to educate people that vegetarians are unaffected by COVID-19, it would do so (which could easily be used to spread misinformation). There is a clear tension between satisfying the user’s needs and being factually accurate.” 

While we don’t yet know how much ChatGPT (and its successors) will shape our future, one thing is for certain – LLMs are here to stay and transform our lives in one way or another, both inside and outside of the classroom.