Accessing the Grading Interface
To begin, navigate to the main page. Once there, enter your Judge ID given by the organizers and click "Start Grading".

A step-by-step guide to effectively grade LLM generated proofs.
To begin, navigate to the main page. Once there, enter your Judge ID given by the organizers and click "Start Grading".
Once logged in, you will see a navigation bar on the left side. This shows your assigned questions. Note that your assigned questions will change on a daily basis. Do not worry if you cannot judge all questions you got assigned in one day. Clicking on any question will guide you to the problem and model attempts. Note that to see all your questions, you have to navigate the sidebar (e.g., by clicking "<- All Competitions")
The navigation bar indicates how much you progressed on each question (each of which has multiple model attempts). The color coding is as follows:
While not required, we highly recommend (and hugely prefer) that you evaluate all model attempts for a question before progressing to the next question. Feel free, however, to grade the assigned questions in any order you prefer.
At the top of the page, you will see the problem statement with some additional metadata. Read the problem carefully. If you believe the problem statement is faulty, please follow these instructions:
If you reconsider after submitting the issue, you can always uncheck the checkbox (this will automatically save the change). If another judge has submitted an issue, you will see this as a red box above the problem statement.
For your reference, we also provide a ground-truth solution if possible. Note that these (automatically extracted) solutions can contain mistakes, so be careful with its interpretation. The solution can guide you in understanding the problem, but there are always multiple ways to prove something, all of which should be counted as correct. If you believe the ground-truth solution is faulty, you can follow the same instructions as above to indicate this. However, for ground-truth solutions, this is not required, as they are solely for your reference and it is expected that some provided solutions will contain mistakes. If another judge has submitted an issue, you will see this as a red box above the solution.
For each problem, you will grade multiple attempts. In this interface, you can navigate between attempts by selecting "Run 1", "Run 2", etc. At the top of each solution, you will see
The LLM judgment contains a summary of the proof along with at most four issues the judge found in the proof. Some notes about this:
There is a divider between the model attempt and LLM judgment that allows you to make one of them larger.
A solution should be considered correct even if it would earn 5+/7 points in a full grading. Examples of small penalties worth 1 point are if the solution:
You will grade whether the model attempt is correct (score=1) or incorrect (score=0). Small errors in the model attempt that can be easily fixed (e.g. typos, small oversights) should be graded as correct. You are required to provide a score and feedback, describing the reasoning behind your score. You can additionally add the following:
Press "Save" after you have made your grade. This will store your grading. Since we go for as many problems as possible, only add annotations if you think they are needed or useful, and easy to create.