Reasoning and Agents

Our Framework

Our AI
learning loop

We envision the AI learning loop like this: it takes in both a task and the available capabilities — tools, skills, or agents. These first feed into a lead agent, which outputs an orchestration (single-agent or multi-agent, centralized or decentralized). The response then goes to the environment, whose feedback flows back to the lead agent — or, optionally, into memory.

01 Task + Capability

Each run begins with a task and the external capabilities available to solve it — tools, skills, data, and environments.

02 Lead Agent

A lead (meta) agent receives the task. Its job is not to answer directly, but to decide how the work should be done.

03 Orchestration

The lead composes and coordinates specialized sub-agents — assigning roles, context, tools, and a flow between them.

04 Environment

Sub-agents act in an environment, calling tools and producing intermediate results that become a response.

05 Feedback

The environment returns feedback. This signal is how the system learns whether the orchestration actually worked.

06 Memory (optional)

Memory carries state across turns, so the lead agent can adapt and improve over time.

Scroll to our roadmap

Our Roadmap

Roadmap of
our work

Three works lay the groundwork — the Survey, MAS-Zero, and MAS-Orchestra — then the agenda branches fast into parallel directions. Everything below — our papers and thought leadership — converges into the live demo, the real application you can play with.

2025 · Survey

A Survey of Frontiers in LLM Reasoning

The map that framed our whole agenda — from standalone LLMs, to single-agent systems, to multi-agent systems. The must-read starting point that everything below builds on.

TMLR'25🏅 Survey CertificationNeurIPS'25 Tutorial

Project Page Paper

2025 · MAS Policy

MAS-Zero

The first to make MAS design automatic — an inference-time, self-refinement framework that builds multi-agent systems with zero supervision. The seed idea the rest of the line scales up.

inferenceSEA@NeurIPS'25🏅 Oral

Project Page Paper Code MAS Collection

2026 · MAS Policy · Analysis

MAS-Orchestra

Holistic, training-time orchestration via function-calling RL — composing an entire MAS at each step, not piece by piece — paired with MASBench, a controlled study of when multi-agent beats single-agent. Beats GPT-5 and Claude-Sonnet-4.5 by up to 23% across 5 benchmarks, at a 10× efficiency gain. It's the trunk the three directions below branch from.

trainingICML'26

Project Page Live Demo Paper Code

2026 · MAS Policy

SkillOrchestra

Building on the orchestration line — skill-based agent routing, an offline approach that outperforms SOTA RL orchestrators by 22.5% at 700× lower cost.

inference🏅 #2 Hugging Face Daily

Paper

2026 · Verify Orchestration

OrchRM & MAS-ProVe

Can we judge an orchestration without running the whole system? Reward modeling and process verification at the orchestration level.

trainingevaluationMAS-ProVe · ICML'26OrchRM · under submission

OrchRM Paper MAS-ProVe Paper

2026 · Analysis

IlluMAS & LiveResearchBench

When does multi-agent actually beat single-agent? A cost-controlled study plus benchmarks that stress-test the multi-agent advantage.

evaluationICLR'26 · under submission

IlluMAS Page IlluMAS Paper

try it live

Vibe-code your MAS

Where the whole series becomes something you can run. Describe a problem and Orchestra designs a multi-agent plan, executes it, and lets you refine it in chat — across math, search, deep research, and enterprise ops.

mas-orchestra.salesforceresearch.ai

“Find all positive integer triples (a,b,c) …”

Orchestra · designed a multi-agent plan, ready to run

AIMEHotpotQABrowseComp+MASBenchSMFREnterpriseOpsLiveResearchBench

Open the live demo →

What's next

The open challenges

A lead agent configures the environment, orchestrates sub-agents, receives feedback, and updates memory. The hard parts that remain: Policy (a huge action space), Verification (judging a plan without a full rollout), and Environment (an evolving harness).

PolicyVerificationEnvironment

Read the blog

Our AIlearning loop

Roadmap ofour work

Our AI
learning loop

Roadmap of
our work