102584 - As we can see that models
N. Lygeros
As we can see that models progressing from near-zero to near-perfect performance in a short timeframe this should be done quickly.
Gemini 3 Pro got already 38% without tools and 45.8% with search and code execution. For example AIME 2025 is finished i.e. Gemini 3 Pro and Claude Sonnet 4.5 got 100% with search and code execution.
So HLE where LLMs achieve very low accuracy, for the moment, is very useful to see the real progress of AI.