What is AGI? Nobody agrees, and it’s tearing Microsoft and OpenAI apart.

37 2 minutes read

The reported $100 billion profit threshold we mentioned earlier conflates commercial success with cognitive capability, as if a system’s ability to generate revenue says anything meaningful about whether it can “think,” “reason,” or “understand” the world like a human.

Sam Altman speaks onstage during The New York Times Dealbook Summit 2024 at Jazz at Lincoln Center on December 4, 2024, in New York City.

Credit:

Eugene Gologursky via Getty Images

Depending on your definition, we may already have AGI, or it may be physically impossible to achieve. If you define AGI as “AI that performs better than most humans at most tasks,” then current language models potentially meet that bar for certain types of work (which tasks, which humans, what is “better”?), but agreement on whether that is true is far from universal. This says nothing of the even murkier concept of “superintelligence”—another nebulous term for a hypothetical, god-like intellect so far beyond human cognition that, like AGI, defies any solid definition or benchmark.

Given this definitional chaos, researchers have tried to create objective benchmarks to measure progress toward AGI, but these attempts have revealed their own set of problems.

Why benchmarks keep failing us

The search for better AGI benchmarks has produced some interesting alternatives to the Turing Test. The Abstraction and Reasoning Corpus (ARC-AGI), introduced in 2019 by François Chollet, tests whether AI systems can solve novel visual puzzles that require deep and novel analytical reasoning.

“Almost all current AI benchmarks can be solved purely via memorization,” Chollet told Freethink in August 2024. A major problem with AI benchmarks currently stems from data contamination—when test questions end up in training data, models can appear to perform well without truly “understanding” the underlying concepts. Large language models serve as master imitators, mimicking patterns found in training data, but not always originating novel solutions to problems.

But even sophisticated benchmarks like ARC-AGI face a fundamental problem: They’re still trying to reduce intelligence to a score. And while improved benchmarks are essential for measuring empirical progress in a scientific framework, intelligence isn’t a single thing you can measure like height or weight—it’s a complex constellation of abilities that manifest differently in different contexts. Indeed, we don’t even have a complete functional definition of human intelligence, so defining artificial intelligence by any single benchmark score is likely to capture only a small part of the complete picture.

Source link

What is AGI? Nobody agrees, and it’s tearing Microsoft and OpenAI apart.

Why benchmarks keep failing us

WDC NEWS 6 STAFF

Pirates’ Oneil Cruz accepts Home Run Derby invite

12 Packing Tricks That Save Space—So You Can Actually Zip Your Suitcase Closed

Trump Media Plans To Launch ‘Crypto Blue Chip ETF’ Holding Bitcoin, Ether, Solana and More

’16 & Pregnant’ Star Whitney Purvis’ Ex Hopes Arrest Is Wake-Up Call She Needs

Conspiracy Theories About the Texas Floods Lead to Death Threats

The Cost of Replacing Air Conditioners in 2025

Richard M. Cohen, 76, News Producer Who Wrote of Health Challenge, Dies

Buyers urged to be vigilant for signs of Japanese knotweed during winter

Zoopla adds crime and flood data

Are Estate Agents For Sale Boards ‘Visual Pollution’ as Westminster Council Believes or An Effective Marketing Tool?

AI’s Evolving Role In Real Estate And Property Rental

Why benchmarks keep failing us

Subscribe to our mailing list to get the new updates!

Dog Food Toppers for Allergies: Do They Work? (Vet-Approved Tips)

Rishi Sunak takes job at Goldman Sachs

Related Articles

The Cost of Replacing Air Conditioners in 2025

Richard M. Cohen, 76, News Producer Who Wrote of Health Challenge, Dies

Buyers urged to be vigilant for signs of Japanese knotweed during winter

Zoopla adds crime and flood data

Are Estate Agents For Sale Boards ‘Visual Pollution’ as Westminster Council Believes or An Effective Marketing Tool?

AI’s Evolving Role In Real Estate And Property Rental