Monday, 21 April
poster

Monday, 21 April2025

OpenAI’s o3 Model Scores Lower Than Expected on Independent Benchmark

OpenAI’s o3 Model Scores Lower Than Expected on Independent Benchmark

OpenAI's o3 AI model, initially claimed to solve over 25% of problems on the challenging FrontierMath benchmark, has underperformed in independent evaluations. Epoch AI, the institute behind FrontierMath, reported that o3 achieved around 10% accuracy, significantly lower than OpenAI's earlier assertions. The discrepancy is attributed to differences in testing conditions, with OpenAI's internal assessments possibly utilizing more computational resources and different problem subsets.

Subscribe To Our Newsletter.

Full Name
Email