MIT and Microsoft's 'Murakkab' cuts the cost and energy of running AI agents
Researchers from MIT and Microsoft Azure detailed Murakkab, a system that lets developers describe agentic workflows in plain language and then automatically picks the models, tools, and hardware to run them — using roughly a third of the compute and a quarter of the energy and cost of conventional approaches in tests.
Researchers from MIT and Microsoft Azure have detailed Murakkab, a system designed to make agentic AI workflows dramatically cheaper and less energy-hungry to run. Described in an MIT News writeup on June 25, the work targets a problem that has grown alongside the agent boom: the elaborate, multi-model pipelines behind modern agents are expensive and wasteful, partly because developers hand-wire which models, tools, and hardware each step uses and rarely revisit those choices.
Describe the goal, not the plumbing
Murakkab's central idea is to let a developer describe what an agentic application should accomplish in plain language, rather than specifying how the components fit together. The system then automatically selects the best available models and tools, decides which steps can run in parallel versus in sequence, and maps the workload onto hardware — continuously reconfiguring at runtime to honor the developer's stated priorities, whether that is minimizing cost, maximizing speed, or reducing energy use.
A profile-guided optimizer and an adaptive runtime jointly manage that full stack, a kind of cross-layer optimization the researchers say existing agent frameworks and cloud schedulers cannot do on their own because each only sees part of the picture.
The efficiency numbers
In the team's tests, the gains were substantial. Murakkab completed comparable work using about 35 percent of the compute, roughly 27 percent of the energy, and around 25 percent of the cost of more conventional approaches. In one case the system achieved better than an order-of-magnitude reduction in energy with only about a 2 percent drop in accuracy — the kind of trade-off that becomes attractive at scale.
The project is credited to MIT's Gohar Chaudhry and associate professor Adam Belay, working with Microsoft Azure technical fellow Ricardo Bianchini and collaborators, and is slated for presentation at the USENIX Symposium on Operating Systems Design and Implementation (OSDI).
The timing is notable. As enterprises move agents from pilots into production, the recurring bill for inference — not the one-time cost of building an agent — is becoming the dominant expense, and data-center energy draw is under growing scrutiny. Work like Murakkab reframes agent reliability and economics as a systems problem, suggesting much of the waste in today's agent stacks comes not from the models themselves but from how clumsily they are wired together.
Sources
AI-assisted reporting, overseen by the AgentsAI team. Spotted an error? Let us know.
More ai news
OpenAI and Broadcom unveil 'Jalapeño,' a custom chip built only for LLM inference
OpenAI revealed its first custom silicon, Jalapeño — a Broadcom-built ASIC designed to do one thing, run large language model inference, with engineering samples already in the lab and gigawatt-scale deployment targeted for late 2026.
US partially lifts Anthropic's Mythos 5 export ban, Fable 5 still blocked
The Trump administration reversed course on June 26, allowing Anthropic's Claude Mythos 5 to reach more than 100 vetted US companies and agencies after a two-week export ban shut the model down globally — but the more widely used Fable 5 remains off-limits.
OpenAI agrees to stagger GPT-5.6 release as US government screens early access
OpenAI will limit the initial release of GPT-5.6 to a small set of government-approved partners after a request from the Trump administration, with federal officials approving customers one by one during the preview — a notable shift in how frontier models reach the public.
Transformer co-inventor Noam Shazeer leaves Google for OpenAI
Noam Shazeer, a co-author of the 2017 'Attention Is All You Need' paper and co-lead of Google's Gemini models, announced he is joining OpenAI as Lead for Architecture Research — a departure that cost Google $2.7 billion to prevent just two years ago.