Monday, 21 April, 2025
OpenAI’s o3 Model Scores Lower Than Expected on Independent Benchmark

OpenAI's o3 AI model, initially claimed to solve over 25% of problems on the challenging FrontierMath benchmark, has underperformed in independent evaluations. Epoch AI, the institute behind FrontierMath, reported that o3 achieved around 10% accuracy, significantly lower than OpenAI's earlier assertions. The discrepancy is attributed to differences in testing conditions, with OpenAI's internal assessments possibly utilizing more computational resources and different problem subsets.
Read full story at TechCrunch