Fixing Mathematics, one Theorem at a Time

-
June 27, 2025

An automated software program is helping researchers streamline theorem proving

Most people think that maths is the most precise of all subjects. Once you prove that a statement is mathematically true, it remains true forever, and anyone who is smart enough can verify that truth themselves, right? But what if I told you that most maths papers that have ever been published are riddled with small inaccuracies and inconsistencies, and that most mathematicians are aware of this fact? 

In mathematical research, ideas can get insanely complicated. One reason is because all of the “simple” ideas have more or less already been thought of and proven by someone else. To come up with new theories or results, you have to build upon centuries and centuries of mathematical development.

Maths papers are often quite densely written. To keep them readable and at a reasonable length, authors often assume that the reader is already familiar with the prerequisites and skip some details and intermediary steps. This is contrary to when we were told to “show our work” in school while solving an equation. This approach can get problematic because the information we omit for convenience might end up being important. Take a look at this example of a “proof” showing that 2=1:

Anyone paying close enough attention to it, and familiar with the rules of division, can spot the mistake here: We assumed that x =0, and then we ended up dividing by x, which is the same as division by 0. To avoid this, we should have clarified that our denominator is not 0 each time we perform a division. Imagine having to clarify hundreds of small details like this throughout a paper. That would simply not be practical. 

However, if we aren’t careful, these simplifications can very easily lead to wrong results. Imagine a maths paper with hundreds (if not thousands) of hidden assumptions and simplifications, each of which requires expertise in the particular subfield to fully understand. How can you be sure that such an argument can be right, if some explanatory steps were skipped? The only hope is to have it verified by experts in the field who can fill in all the gaps, in a process called the peer review. While peer review can pick out mistakes in papers, it is not perfect. Factors such as the lack of availability of experts, bias or simple laziness can lead to mistakes passing through peer review. 

This is where Lean comes in. Lean is a functional programming language similar to Java, C or Python, which can be used as a “theorem prover”. In these popular languages, each program is written using lines of code that instruct the computer to carry out a series of tasks in a particular order. These lines of code have to follow the rules of logic and syntax specific to the programming language being used, to ensure that the computer can actually carry out those instructions. These rules are enforced by a program called the compiler. In most cases, the compiler does not care whether your program actually fulfills its intended purpose, only whether it is a valid set of instructions that the computer can follow. 

Lean is different because it ensures that our program fulfills its intended purpose, which in our case is writing down the proof of a theorem. Each Lean program consists of a theorem statement and a proof. When we run the program, the compiler goes through each line of the proof to see if it actually proves the stated theorem. This means that we can verify the correctness of the proof in a much more robust manner, without involving any humans in the process.

Think of the process of writing a mathematical proof as building a bridge. The results that we know to be true for sure are the foundations of this bridge. We build upon them using various arguments and logical inferences, which are the bricks in our analogy. The final structure can’t be stable unless the foundation is solid and all the bricks have been laid in logically correct spots. Even a slight mistake can topple the whole structure (as there is no such thing as “almost true” in mathematics). Writing proofs by hand can be compared to making a blueprint of how to build the bridge. Only an expert (or peer reviewer) can look at a complicated blueprint and tell whether or not the bridge would hold, but even they might miss small details sometimes. Lean forces us to build each and every part of the bridge, and then shakes it around to test for any weaknesses. In this way, it provides the ultimate test for correctness of a proof.

So why aren’t all maths proofs written in Lean? The biggest hurdle is that we have to write out all the details of the proof, since the program can’t be expected to have any level of expertise to fill in the missing details on its own. This level of effort is orders of magnitude greater than typing out all the steps of the same proof in a research paper. However, this is expected to become a lot easier very soon. All mathematical results proven in Lean are added to a digital library called mathlib. Then, these results can be used by anyone else without proof just by referencing this library, drastically reducing the amount of code we need to write. Currently, mathlib is not very big and doesn’t “know” a lot of mathematics. However, its rate of growth has been exponential, since the progress builds upon itself. The hope is that once mathlib becomes large enough, the effort to formalise proofs in Lean becomes comparable to the effort involved in typing it out to publish in a journal. Then, a researcher can rest assured that whatever complex ideas they came up with are indeed mathematically accurate. 

Flowchart showing the structure of a proof in Lean (Image: Aviral Sood)

A new way of doing mathematics 

The biggest advantage of using Lean is that it allows us to compartmentalise the enormous complexity of modern mathematics. Because of the huge intricacies in the arguments involved, mathematicians with expertise in one subfield have a hard time even understanding a paper written in a different subfield. 

But with Lean, it does not matter who writes the proof. If the arguments are sound, Lean will guarantee its correctness without the need for discussion or cross-checking. If we take a large problem and break it down into smaller, more approachable components, different people can work on these smaller components without worrying about the bigger picture. People who would never have the ability or expertise to work on the whole problem can still contribute by solving these components. This allows for collaboration at a scale that no one has seen before. 

A good example of this was seen recently in the Equational Theories project in which around 50 contributors from around the world worked together to establish the truth of around 22 million algebraic identities involving objects called magmas. Though each of the identities on its own was relatively easy to prove or disprove, codifying the problem in Lean greatly reduced the complexity of cross-checking and verification. “You can manage this complexity only because you have all these components that are built to specification,” says Siddarth Gadgil, Professor in the Department of Mathematics at IISc. 

This process can be compared to designing a computer chip, which is an immensely complicated object with components as small as a few nanometers in size. To get such an object working reliably, you need to be exactly certain that each component down to the last transistor is perfectly built. Perhaps due to this similarity in structure, theorem verification softwares like Lean have long been used in the industry by companies like AMD and Intel to verify the proper functioning of their processors.

Example Lean code that uses induction and arithmetic tactics to show the sum of the first n natural numbers 1+2+3+…+n = n(n+1)/2 (Image: Aviral Sood)

Recently, there has been an explosion in the field of automated theorem proving due to the huge increase in effectiveness of Large Language Models (LLMs). Popular LLM-based AI assistants like ChatGPT and Claude now include reasoning models, which can “think” through a problem much like a human does. “Five years ago, we used to think that language models were an important component of machine intelligence. Now, it is seen as the component that will lead to intelligence,” says Danish Pruthi, Assistant Professor at the Department of Computational and Data Sciences (CDS) in IISc. 

Given the current tendency of LLMs to “hallucinate” where it comes up with completely false information with complete confidence, LLMs and Lean are a perfect match for each other. Combining the reasoning ability of these models with the rigorous environment of Lean can allow us to automatically solve a large range of maths problems. In fact, 99% of the identities in the earlier mentioned Equational Theories project were solved using a similar combined approach. And given that both the Lean mathlib library and LLM performance are growing at an exponential pace, their capabilities can potentially surpass those of a large number of professional mathematicians, given enough time.

But because Lean and automated theorem proving are still in their infancy, not many mathematicians and computer scientists are exploring their interactions with LLMs, yet. “Trust in these AI models will come through their performance,” says Danish, “I have students come up and tell me, ‘I tried Copilot [an AI tool that helps with writing code] once, now I can’t imagine my life without it.’”