98402 - When you test AI
Ν. Λυγερός
When you test AI models with AIME, GPQA Diamond, Humanity’s Last Exam, MATH-500, MMLU-Pro, LiveCodeBench, SciCode, Grok 4 is the more efficient. So you don’t need to believe, just to see it.
When you test AI models with AIME, GPQA Diamond, Humanity’s Last Exam, MATH-500, MMLU-Pro, LiveCodeBench, SciCode, Grok 4 is the more efficient. So you don’t need to believe, just to see it.