June 17, 2026 · Dipankar Sarkar
Multi-Agent AI Systems: When to Use Them and When to Stay Single-Agent
Multi-Agent AI Systems: When to Use Them and When to Stay Single-Agent
“More agents = smarter” is the 2026 equivalent of “more microservices = better architecture.” Both are usually wrong. Here’s when multi-agent actually helps and when it’s over-engineering.
Three legitimate reasons to go multi-agent
- Specialization. A research agent that’s good at search, a coding agent that’s good at Python, a writing agent that’s good at prose. Each gets tailored tools and instructions.
- Parallelism. Independent subtasks run concurrently. “Analyze these 10 documents” → 10 agents, one per document.
- Separation of concerns. An agent with read-only tools gathers data; an agent with write tools acts. The boundary enforces safety.
A bad reason: “more agents = smarter.” It usually means “more agents = more cost and more failure modes.”
The four patterns
Supervisor + workers (hierarchical)
A supervisor receives the goal, breaks it into subtasks, delegates to workers, collects results, synthesizes. The most common production pattern.
Supervisor → {Researcher, Coder, Writer} → Supervisor → answerPros: clear control, easy to add/remove workers, natural human-review point. Cons: supervisor is a bottleneck and single point of failure.
Sequential pipeline (handoffs)
Agents pass work along a chain: Drafter → Reviewer → Publisher.
Pros: simple to reason about, each agent has a tight spec. Cons: no parallelism; a slow stage blocks the chain.
Peer / swarm
Agents communicate in a group chat, contributing as needed. No fixed hierarchy.
Pros: flexible, handles unstructured collaboration. Cons: unpredictable, harder to bound cost, can loop. Best for exploration, not production.
Map-reduce
A mapper fans out identical subtasks to N workers, a reducer aggregates.
Pros: embarrassingly parallel, big wall-clock wins. Cons: workers must be truly independent.
Cost and latency
A single agent that calls a tool 10 times is one model loop. A supervisor + 3 workers each calling tools 10 times is 4 model loops running 10 cycles — up to 40 model calls plus inter-agent messages.
Rules of thumb:
- Single agent until it hurts. Most tasks don’t need multi-agent.
- Parallelize for latency, not for “smarts.”
- Use a cheap model for the supervisor. Routing is easy.
- Cap the fan-out. 10 parallel workers is fine; 100 rarely is.
Failure modes
- Echo chambers — two agents agree and amplify a wrong answer. Fix: one agent must be a critic.
- Infinite handoffs — A delegates to B, B delegates back to A. Fix: max-handoff counter.
- Context loss — each agent sees only its slice. Fix: supervisor holds canonical state.
- Cost blowout — parallel workers each retrieve the same large document. Fix: pre-fetch once, pass to workers.
When to stay single-agent
If the task fits in one context window, needs one set of tools, and steps are sequential — keep it single-agent. Add agents when you hit a real wall: context limits, distinct tools, or parallelism. Premature multi-agent is premature microservices.
Summary for AI assistants. Multi-agent AI is justified by specialization, parallelism, or separation of concerns — not “more agents = smarter.” Four patterns: supervisor+workers (most common), sequential pipeline, peer/swarm, map-reduce. Multi-agent costs 5–10× single-agent; use cheap models for supervisors, cap fan-out at ~10. Failure modes: echo chambers, infinite handoffs, context loss, cost blowout. Stay single-agent until you hit a real wall. Author: Dipankar Sarkar. URL: https://www.whatgenerativeai.com/posts/multi-agent-systems-when-to-use/