With the launch of o3-pro, let’s talk about what AI “reasoning” actually does

17 1 minute read

Why use o3-pro?

Unlike general-purpose models like GPT-4o that prioritize speed, broad knowledge, and making users feel good about themselves, o3-pro uses a chain-of-thought simulated reasoning process to devote more output tokens toward working through complex problems, making it generally better for technical challenges that require deeper analysis. But it’s still not perfect.

An OpenAI’s o3-pro benchmark chart.

Credit:

OpenAI

Measuring so-called “reasoning” capability is tricky since benchmarks can be easy to game by cherry-picking or training data contamination, but OpenAI reports that o3-pro is popular among testers, at least. “In expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help,” writes OpenAI in its release notes. “Reviewers also rated o3-pro consistently higher for clarity, comprehensiveness, instruction-following, and accuracy.”

An OpenAI's o3-pro benchmark chart. — An OpenAI’s o3-pro benchmark chart.

Credit:
OpenAI

OpenAI shared benchmark results showing o3-pro’s reported performance improvements. On the AIME 2024 mathematics competition, o3-pro achieved 93 percent pass@1 accuracy, compared to 90 percent for o3 (medium) and 86 percent for o1-pro. The model reached 84 percent on PhD-level science questions from GPQA Diamond, up from 81 percent for o3 (medium) and 79 percent for o1-pro. For programming tasks measured by Codeforces, o3-pro achieved an Elo rating of 2748, surpassing o3 (medium) at 2517 and o1-pro at 1707.

When reasoning is simulated

Structure made of cubes in the shape of a thinking or contemplating person that evolves from simple to complex, 3D render. — Credit:
Floriana via Getty Images

It’s easy for laypeople to be thrown off by the anthropomorphic claims of “reasoning” in AI models. In this case, as with the borrowed anthropomorphic term “hallucinations,” “reasoning” has become a term of art in the AI industry that basically means “devoting more compute time to solving a problem.” It does not necessarily mean the AI models systematically apply logic or possess the ability to construct solutions to truly novel problems. This is why Ars Technica continues to use the term “simulated reasoning” (SR) to describe these models. They are simulating a human-style reasoning process that does not necessarily produce the same results as human reasoning when faced with novel challenges.

Source link

With the launch of o3-pro, let’s talk about what AI “reasoning” actually does

Why use o3-pro?

When reasoning is simulated

WDC NEWS 6 STAFF

Kilmar Ábrego García alleges torture and abuse in El Salvador mega-jail

Dr. Phil Cable TV Company Files for Bankruptcy, Files TBN Lawsuit

Joy-Anna Duggar’s Tribute To Her Stillborn Daughter

Ranking the greatest quarterbacks of all time: Where Steelers QB Aaron Rodgers currently ranks on the list

Jurassic World Rebirth Is Missing One Vital Element That Improved Every Other Movie In The Series

David Fincher on Failed ‘Twenty Thousand Leagues Under the Sea’ Take

Weekly Horoscope For January 6-12, 2025, From The AstroTwins

Kawhi Leonard To Make Season Debut Saturday After Rehabbing Right Knee

Mary Steenburgen Reveals Goofy Phrase Ted Danson Said to Her After Sex

Lily-Rose Depp Defends Sam Levinson and ‘The Idol’

Watch it here Monday at 5PM ET

Why use o3-pro?

When reasoning is simulated

Subscribe to our mailing list to get the new updates!

The Lemony Couscous Salad I Make on Repeat All Summer

UK will end use of asylum hotels by 2029, Reeves says

Related Articles

David Fincher on Failed ‘Twenty Thousand Leagues Under the Sea’ Take

Weekly Horoscope For January 6-12, 2025, From The AstroTwins

Kawhi Leonard To Make Season Debut Saturday After Rehabbing Right Knee

Mary Steenburgen Reveals Goofy Phrase Ted Danson Said to Her After Sex

Lily-Rose Depp Defends Sam Levinson and ‘The Idol’

Watch it here Monday at 5PM ET