Technical

Srinivasa Reddy Kandi: New APEX-Agents Benchmark Suggests AI Agents Still Aren’t Ready for White-Collar Work

January, 23, 2026-04:45

Share: Facebook | Twitter | Whatsapp | Linkedin | Visits: 37999 | 2821


Srinivasa Reddy Kandi: New APEX-Agents Benchmark Suggests AI Agents Still Aren’t Ready for White-Collar Work

New APEX-Agents Benchmark Suggests AI Agents Still Aren’t Ready for White-Collar Work:

Nearly two years after Microsoft CEO Satya Nadella predicted that AI would replace large portions of knowledge work, that transformation has yet to materialize. Despite major advances in foundation models, most white-collar professions—ranging from law and investment banking to accounting, IT, and research—remain largely unchanged.

While modern AI systems excel at tasks like deep research and agentic planning, their real-world impact on professional workflows has been limited. This gap between promise and reality has puzzled researchers, but new findings from training-data company Mercor offer fresh insight into why progress has stalled.

Mercor’s research evaluates how leading AI models perform on authentic white-collar tasks drawn from consulting, investment banking, and legal work. The study introduces a new benchmark, called APEX-Agents, designed to simulate real professional environments. The results were striking: every major AI lab failed the test. Even the strongest models answered fewer than 25% of questions correctly, often returning incorrect or incomplete responses.

According to Mercor CEO Brendan Foody, one of the core challenges lies in cross-domain reasoning. Knowledge workers routinely navigate information spread across multiple tools and platforms, a skill that remains difficult for AI agents to replicate.

“One of the big changes in this benchmark is that we built out the entire environment, modeled after real professional services,” Foody told TechCrunch. “In real life, you’re working across Slack, Google Drive, and other systems—not from a single source of context.” For many agentic AI models, this kind of multi-domain coordination remains unreliable.

The findings suggest that while AI agents continue to improve rapidly, they still fall short of the complexity and adaptability required to replace human knowledge workers in the modern workplace.

Author: Kandi Srinivasa Reddy, Srinivasa Reddy Kandi, #KandiSrinivasaReddy, #SrinivasaReddyKandi



Leave a Comment

Search