Patronus AI raises $50M to build 'digital worlds' that stress-test AI agents
AI reliability startup Patronus AI closed a $50M Series B led by Greenfield Partners and unveiled Digital World Models — large-scale simulation environments that train and evaluate agents on realistic, long-horizon software workflows before they ship.
Patronus AI, a startup that builds evaluation and reliability infrastructure for AI systems, has raised a $50 million Series B led by Greenfield Partners, the company announced on June 25. The round brings Patronus's total funding to $70 million and arrives alongside the launch of its first Digital World Models — simulation environments designed to train and test AI agents on realistic, long-horizon tasks before they reach production.
Simulating failure before deployment
The pitch behind Digital World Models is that benchmark scores no longer tell you whether an agent will hold up in the real world. Patronus describes the product as "language diffusion world models" that generate large volumes of simulation data spanning software, research, communication and enterprise workflows. Rather than optimizing an agent for a narrow, static benchmark, the goal is to expose it to ambiguous, multi-step scenarios — the kind where agents tend to drift, loop or take unsafe actions — and surface those failure modes before customers do.
That positioning fits the broader 2026 shift in the agent market: as enterprises move from pilots to production deployments, the hard problem is no longer demoing a capable agent but proving one is reliable enough to run unattended. Patronus is betting that simulated "digital worlds" become a standard rung in that deployment pipeline.
A reliability layer for the agent stack
Founded less than three years ago, Patronus has built its business around AI evaluation, simulation and reliability testing for frontier systems. The company says it now works with a majority of the leading frontier AI labs and hyperscalers, and that its revenue has grown more than 15x over the past year — a figure it attributes to rising demand for tooling that validates increasingly autonomous systems.
The Series B drew participation from existing backers including Notable Capital, Lightspeed Venture Partners, Datadog, Samsung and Factorial Capital, along with a roster of AI and software executives. The new capital is earmarked for scaling the Digital World Models platform and the simulation infrastructure behind it.
The raise lands in a crowded but fast-growing niche. Evaluation, observability and guardrails have become one of the most active corners of the agent ecosystem, as buyers demand evidence that an agent will behave predictably across the messy, open-ended workflows it is increasingly trusted to handle.
Sources
AI-assisted reporting, overseen by the AgentsAI team. Spotted an error? Let us know.
More agents news
OpenAI opens Codex Remote to all paid subscribers with secure QR-relay handoff
OpenAI brought Codex Remote to general availability on June 25, letting every paid ChatGPT subscriber monitor, steer, and approve long-running Codex coding sessions from a phone — without exposing the development machine to the public internet.
RingCentral brings native AI agents to AIR Pro across its customer engagement portfolio
RingCentral expanded its AIR Pro platform with native AI agents that run multi-step voice and digital customer interactions end to end, handing off to human reps with full context. The capabilities are in beta now, with general availability targeted for the second half of 2026.
Assort Health raises $120M to scale voice AI agents across the patient journey
Healthcare AI agent startup Assort Health closed a $120M Series C led by Menlo Ventures at a $1.2 billion valuation, pushing total funding past $222M as its voice agents expand from appointment scheduling into a full patient-access platform.
Google DeepMind publishes defence-in-depth roadmap for AI agents
Google DeepMind released a detailed AI Control Roadmap on June 18 that treats deployed agents as potential insider threats and outlines 15 system-level defences — including runtime monitoring, cryptographic action signing, and a kill switch — tested across roughly one million coding-agent tasks.