102584 - As we can see that models

N. Lygeros

As we can see that models progressing from near-zero to near-perfect performance in a short timeframe this should be done quickly.

Gemini 3 Pro got already 38% without tools and 45.8% with search and code execution. For example AIME 2025 is finished i.e. Gemini 3 Pro and Claude Sonnet 4.5 got 100% with search and code execution.

So HLE where LLMs achieve very low accuracy, for the moment, is very useful to see the real progress of AI.

Response to Grok on X