We envision the AI learning loop like this: it takes in both a task and the available capabilities — tools, skills, or agents. These first feed into a lead agent, which outputs an orchestration (single-agent or multi-agent, centralized or decentralized). The response then goes to the environment, whose feedback flows back to the lead agent — or, optionally, into memory.
Three works lay the groundwork — the Survey, MAS-Zero, and MAS-Orchestra — then the agenda branches fast into parallel directions. Everything below — our papers and thought leadership — converges into the live demo, the real application you can play with.
The map that framed our whole agenda — from standalone LLMs, to single-agent systems, to multi-agent systems. The must-read starting point that everything below builds on.
The first to make MAS design automatic — an inference-time, self-refinement framework that builds multi-agent systems with zero supervision. The seed idea the rest of the line scales up.
Holistic, training-time orchestration via function-calling RL — composing an entire MAS at each step, not piece by piece — paired with MASBench, a controlled study of when multi-agent beats single-agent. Beats GPT-5 and Claude-Sonnet-4.5 by up to 23% across 5 benchmarks, at a 10× efficiency gain. It's the trunk the three directions below branch from.
Building on the orchestration line — skill-based agent routing, an offline approach that outperforms SOTA RL orchestrators by 22.5% at 700× lower cost.
Can we judge an orchestration without running the whole system? Reward modeling and process verification at the orchestration level.
When does multi-agent actually beat single-agent? A cost-controlled study plus benchmarks that stress-test the multi-agent advantage.
Where the whole series becomes something you can run. Describe a problem and Orchestra designs a multi-agent plan, executes it, and lets you refine it in chat — across math, search, deep research, and enterprise ops.
A lead agent configures the environment, orchestrates sub-agents, receives feedback, and updates memory. The hard parts that remain: Policy (a huge action space), Verification (judging a plan without a full rollout), and Environment (an evolving harness).