Our Framework

Our AI
learning loop

We envision the AI learning loop like this: it takes in both a task and the available capabilities — tools, skills, or agents. These first feed into a lead agent, which outputs an orchestration (single-agent or multi-agent, centralized or decentralized). The response then goes to the environment, whose feedback flows back to the lead agent — or, optionally, into memory.

01 Task + Capability
Each run begins with a task and the external capabilities available to solve it — tools, skills, data, and environments.
02 Lead Agent
A lead (meta) agent receives the task. Its job is not to answer directly, but to decide how the work should be done.
03 Orchestration
The lead composes and coordinates specialized sub-agents — assigning roles, context, tools, and a flow between them.
04 Environment
Sub-agents act in an environment, calling tools and producing intermediate results that become a response.
05 Feedback
The environment returns feedback. This signal is how the system learns whether the orchestration actually worked.
06 Memory (optional)
Memory carries state across turns, so the lead agent can adapt and improve over time.
Task + Capability Lead Agent Orchestration Environment Feedback Memory
Scroll to our roadmap
2025 · SURVEYLLM Reasoning Survey 2025 · MAS POLICYMAS-Zero 2026 · MAS POLICY · ANALYSISMAS-Orchestra 2026 · MAS POLICYSkillOrchestra 2026 · VERIFY ORCHESTRATIONOrchRM · MAS-ProVe 2026 · ANALYSISIlluMAS · LiveResearchBench TRY IT LIVEMAS-Orchestra Demo NEXT · FRONTIERPolicy · Verification · Env
Our Roadmap

Roadmap of
our work

Three works lay the groundwork — the Survey, MAS-Zero, and MAS-Orchestra — then the agenda branches fast into parallel directions. Everything below — our papers and thought leadership — converges into the live demo, the real application you can play with.

2025 · Survey
A Survey of Frontiers in LLM Reasoning

The map that framed our whole agenda — from standalone LLMs, to single-agent systems, to multi-agent systems. The must-read starting point that everything below builds on.

TMLR'25🏅 Survey CertificationNeurIPS'25 Tutorial
2025 · MAS Policy
MAS-Zero

The first to make MAS design automatic — an inference-time, self-refinement framework that builds multi-agent systems with zero supervision. The seed idea the rest of the line scales up.

inferenceSEA@NeurIPS'25🏅 Oral
2026 · MAS Policy · Analysis
MAS-Orchestra

Holistic, training-time orchestration via function-calling RL — composing an entire MAS at each step, not piece by piece — paired with MASBench, a controlled study of when multi-agent beats single-agent. Beats GPT-5 and Claude-Sonnet-4.5 by up to 23% across 5 benchmarks, at a 10× efficiency gain. It's the trunk the three directions below branch from.

trainingICML'26
2026 · MAS Policy
SkillOrchestra

Building on the orchestration line — skill-based agent routing, an offline approach that outperforms SOTA RL orchestrators by 22.5% at 700× lower cost.

inference🏅 #2 Hugging Face Daily
2026 · Verify Orchestration
OrchRM & MAS-ProVe

Can we judge an orchestration without running the whole system? Reward modeling and process verification at the orchestration level.

trainingevaluationMAS-ProVe · ICML'26OrchRM · under submission
2026 · Analysis
IlluMAS & LiveResearchBench

When does multi-agent actually beat single-agent? A cost-controlled study plus benchmarks that stress-test the multi-agent advantage.

evaluationICLR'26 · under submission
try it live
Vibe-code your MAS

Where the whole series becomes something you can run. Describe a problem and Orchestra designs a multi-agent plan, executes it, and lets you refine it in chat — across math, search, deep research, and enterprise ops.

mas-orchestra.salesforceresearch.ai
“Find all positive integer triples (a,b,c) …”
LEAD SEARCH SOLVE FINAL
Orchestra · designed a multi-agent plan, ready to run
AIMEHotpotQABrowseComp+MASBenchSMFREnterpriseOpsLiveResearchBench
Open the live demo →
What's next
The open challenges

A lead agent configures the environment, orchestrates sub-agents, receives feedback, and updates memory. The hard parts that remain: Policy (a huge action space), Verification (judging a plan without a full rollout), and Environment (an evolving harness).

PolicyVerificationEnvironment