January, 23, 2026-04:45
Share: Facebook | Twitter | Whatsapp | Linkedin | Visits: 37999 | :2821
New APEX-Agents Benchmark Suggests AI Agents Still Aren’t Ready for White-Collar Work:
Nearly two years after Microsoft CEO Satya Nadella predicted that AI would replace large portions of knowledge work, that transformation has yet to materialize. Despite major advances in foundation models, most white-collar professions—ranging from law and investment banking to accounting, IT, and research—remain largely unchanged.
While modern AI systems excel at tasks like deep research and agentic planning, their real-world impact on professional workflows has been limited. This gap between promise and reality has puzzled researchers, but new findings from training-data company Mercor offer fresh insight into why progress has stalled.
Mercor’s research evaluates how leading AI models perform on authentic white-collar tasks drawn from consulting, investment banking, and legal work. The study introduces a new benchmark, called APEX-Agents, designed to simulate real professional environments. The results were striking: every major AI lab failed the test. Even the strongest models answered fewer than 25% of questions correctly, often returning incorrect or incomplete responses.
According to Mercor CEO Brendan Foody, one of the core challenges lies in cross-domain reasoning. Knowledge workers routinely navigate information spread across multiple tools and platforms, a skill that remains difficult for AI agents to replicate.
“One of the big changes in this benchmark is that we built out the entire environment, modeled after real professional services,” Foody told TechCrunch. “In real life, you’re working across Slack, Google Drive, and other systems—not from a single source of context.” For many agentic AI models, this kind of multi-domain coordination remains unreliable.
The findings suggest that while AI agents continue to improve rapidly, they still fall short of the complexity and adaptability required to replace human knowledge workers in the modern workplace.
Author: Kandi Srinivasa Reddy, Srinivasa Reddy Kandi, #KandiSrinivasaReddy, #SrinivasaReddyKandi
Will Trump have unilateral power or just pretend he does?
The man accused of murdering BBC star John Hunt's wife and two daughters was accused of the rape of one of his victims today.
Chelsea manager Enzo Maresca has acknowledged the club's summer acquisitions may face an early exit from Chelsea in January
Corporate Britain is poised for a significant surge in takeover
Imperative Nature of Cloud Analytics
How EMC consultation services assist clients in implementing cutting-edge information systems?
Why Machine and Artificial Intelligence The Leading Technology?
Is really vegetarian diets do lower your cholesterol
Chelsea Manager Maresca Hints at Potential January Exit for Kiernan Dewsbury-Hall
How Oracle ERP solutions act as a top-class technology ?
Trump to give America's tallest mountain new name
Essential Significance of Cloud Analytics
Manufacturing Strategy
Richard Osman has disclosed the unexpected reason behind his departure from the popular show Child Genius
Is SAP solutions offer diverse range of services?
Farmers Dog Pub Struggles with Rising Operating Expenses