Not a Committee, a Conversation

1. Opening: The Anthropomorphic Reflex

A growing trend in AI‑native projects treats large language models as narrowly‑scoped "employees" inside rigid multi‑agent architectures — writer, reviewer, execution‑logger, statistician — each with artificial permissions and strictly isolated contexts. The pattern is intuitive: it mirrors how we scale human intelligence, by division of labour. It is also, in many cases, the wrong reflex.

This article argues that for tasks that fit inside a single model's effective context window, a single, tool‑augmented agent that iterates with real‑world feedback tends to be structurally superior to a fragmented pipeline of role‑bound agents. The rebuttal is not absolute — there are legitimate reasons to fragment — but the default has been miscalibrated. The current default copies human org charts onto an agent whose nature does not warrant them.

The load‑bearing argument is information‑theoretic (§4). The bookkeeping equations in §5 are illustrative, not a proof; they are how to talk about the asymmetry, not how to establish it.

The thesis aligns naturally with the ANSELM stance: build conversations, not committees.

2. Where the Pattern Comes From, and Why It Misleads

The "AI micro‑corporation" inherits two assumptions from human organisations:

Specialists outperform generalists because human cognitive bandwidth is narrow.
Parallel contests beat serial work because humans iterate slowly.

Neither assumption transfers cleanly to a modern LLM. A general‑purpose model already contains the integrated knowledge that role‑bound agents can only access after costly recombination, and it iterates orders of magnitude faster than any human team. When the same underlying model is wrapped into multiple "roles," the resulting structure inherits the coordination cost of a human org without inheriting any of its cognitive diversity.

Hammond et al. (2026), in their taxonomy of multi‑agent risks, name several failure modes — miscoordination from information asymmetries, network‑effect error propagation, collusion, emergent agency — that arise even among genuinely independent agents. Our concern is narrower and sharper: when fragmentation is unnecessary, it imports those failure modes for no compensating benefit.

3. Accumulation vs. Averaging — An Intuition Pump

Before formalising anything, the core intuition is worth stating plainly.

A single iterative agent keeps every signal — partial drafts, tool outputs, error messages, half‑formed hypotheses — inside one continuous context. Each iteration adds to what the model already knows about this specific problem.
A fragmented pipeline breaks that context at every hand‑off. Each downstream agent sees only a summary of what came before. Information that does not survive the summary is gone.

The first regime accumulates. The second, at best, averages — and it averages over impoverished views of the problem. "At best" matters: in degenerate cases (a critical signal lost at a single hand‑off) the fragmented system can underperform any single agent in it. The structural claim is asymmetry of information flow, not a uniform performance gap.

This is not a theorem; it is a description of where the information lives. The next two sections give it an information‑theoretic footing and a practical bookkeeping form.

4. An Information‑Theoretic Sketch

Let $X$ denote the ground‑truth task (the problem and its full feedback environment), and $Y$ denote the artifact ultimately produced (code, spec, decision). We care about the mutual information $I(X;Y)$ — how much of the task the final artifact actually captures.

Let $S_i$ be the internal state of the $i$‑th processing stage (an iteration, or an agent in a pipeline). Two regimes can be distinguished.

Iterative regime (single context). All stages share one state that grows monotonically:

$$ S_{i+1} = S_i \cup \Delta_i, $$

where $\Delta_i$ is the new evidence acquired at step $i$ (a tool result, a self‑critique, an environment signal). Because no information is discarded between steps, $I(X;S_{i+1}) \geq I(X;S_i)$. Convergence is bounded only by context capacity and by saturation of useful evidence.

Pipelined regime (hand‑offs). Each agent $A_k$ sees only a summary $T_k = f_k(S_k)$ of the prior state, where $f_k$ is a lossy summarisation function. By the data processing inequality,

$$ I(X;,T_k) \leq I(X;,S_k). $$

The inequality is strict whenever $f_k$ is genuinely lossy — and in practice it almost always is, because hand‑offs are summarisation bottlenecks, permission filters, or schema projections (what we call permission theatre: filtering done for organisational tidiness rather than for real safety).

Two consequences follow without needing precise numbers:

Hand‑off loss compounds. Across $n$ stages, the upper bound on $I(X;Y)$ decays multiplicatively in the channel capacities of each $f_k$.
Parallel ensembles do not recover the loss. Aggregating $m$ parallel agents whose individual states are all bounded by the same lossy view cannot exceed that bound; the aggregator inherits the ceiling of its inputs.

This is why the iterative single‑agent regime can keep climbing while pipelined and ensemble regimes hit a structural ceiling set by their narrowest interface.

5. A Practitioner's Bookkeeping (Intuition Pump, Not Theorem)

A friendlier way to keep score, useful for design discussions even though it is not a formal proof:

$K_0$: the model's pretrained baseline.
$E_i$: insight produced at step $i$ (in the multi‑agent case, necessarily partial — limited by the agent's narrow scope).
$C_i$: redundancy with what is already known, so the net gain is $E_i - C_i$.
$\gamma_i \in [0,1]$: the hand‑off discount — the fraction of an agent's insight that survives summarisation, permission filtering, or schema projection on its way to the next stage.

For a single iterative agent in one continuous context:

$$ V_{\text{iter}} = K_0 + \sum_{i=1}^{n} (E_i - C_i). $$

For a one‑shot parallel ensemble of $n$ role‑bound agents whose outputs are merged by averaging (e.g., an LLM judge or summariser):

$$ V_{\text{multi}}^{(n)} \approx K_0 + \frac{1}{n}\sum_{i=1}^{n} \gamma_i E_i. $$

For a sequential pipeline the analogue is $K_0 + \sum_i \prod_{j \le i} \gamma_j E_i$, with hand‑off discounts compounding.

These expressions are intuition pumps. The load‑bearing claim is the data‑processing argument in §4; the bookkeeping above is just a memorable way to talk about it.

A note on the choice of baseline. We deliberately model the parallel‑averaging case rather than the sequential pipeline because it is the more optimistic fragmentation baseline: averaging independent insights is, in principle, less lossy than compounding hand‑off discounts down a chain. If iteration beats parallel averaging, it beats sequential pipelines a fortiori.

6. Three Architectures, Cleanly Separated

A frequent confusion in this debate is treating "multi‑agent" as one thing. It is at least three.

6.1 Iterative single agent

One model, one continuous context, real tools. The agent drafts, executes, observes, critiques itself, revises. This is the regime that accumulates. ANSELM's "co‑pilot" sits here. Note that multi‑agent debate patterns — where a single model is prompted to argue against itself, or to adopt opposing perspectives within one conversation — are iterative in this sense, not pipelined: they share context and accumulate.

6.2 Sequential pipelines

Writer → Reviewer → Reviser, with each stage seeing only the prior stage's output (or worse, a summary of it). This is the regime where the data processing inequality bites hardest, because every hand‑off is a lossy bottleneck and the chain is serial.

A pipeline can still be useful when the stages genuinely require different capabilities (a small cheap model triages, an expensive model reasons), or when auditability demands a separation between actor and critic. But absent those reasons, a pipeline of identical models is mostly a context‑destruction machine.

6.3 Parallel ensembles ("contests")

Multiple agents run in parallel; an aggregator picks or merges. This pattern works for humans because human experts have genuinely diverse base knowledge $K_0$ and iterate slowly. With instances of the same model, $K_0$ is shared up to sampling noise, and serial iteration is fast enough to dominate. Same‑model contests therefore harvest noise, not diversity.

A genuinely heterogeneous ensemble (e.g., distinct model families with complementary biases) can still earn its keep — but even there, a sequential use of diverse models inside a single conversation typically beats a one‑shot parallel vote, because each model gets to see what the others actually said rather than only the aggregator's verdict.¹

When agents are different fine‑tuned models, $K_0$ values can diverge meaningfully and some of the human‑contest logic partially applies. The iteration‑speed advantage remains, however, and per‑role fine‑tuning is rarely the actual practice in the micro‑corporation pattern — it is usually one base model with different system prompts.

7. When Fragmentation Is Genuinely Justified

The honest design rule is not "never fragment." It is fragment only when something forces you to. Legitimate forcing functions include:

Security and trust boundaries. PII scrubbing, untrusted‑input sanitisation, or enforcement of capability limits where an isolated context is the point.
Context‑window saturation. When a task genuinely outgrows the model's effective context, controlled offloading to a new session is a fallback — not a starting architecture.
Cost and latency shaping. Cheap models for triage, expensive models for reasoning; embarrassingly parallel sub‑tasks that are truly independent.
Auditability and accountability. Separating actor from critic so the rationale is logged through an interface, not buried in a single transcript.
Heterogeneous model diversity. Distinct model families used deliberately for their different priors.
Tool‑shaped "agents." A SQL executor, a sandboxed code runner, or a search wrapper is a deterministic tool, not a cognitive agent. Functional tool separation is healthy; the present critique applies only to role‑based cognitive fragmentation.

None of these is "because that is how a human team would do it." Each is a concrete property of the deployment, not an organisational metaphor.

Common counter‑arguments, briefly addressed

"Large tasks require decomposition." True when a task exceeds the context window — that is one of the forcing functions above. The argument here targets chosen decomposition that mirrors human job titles for tasks that do fit in one window.
"Multi‑agent debate improves reasoning." Recent work — notably Du et al. (2023) on multi‑agent debate, and the broader self‑critique / society‑of‑minds line — reports that having a model argue against itself in named roles ("proposer", "skeptic", "judge") reduces bias and improves answers. This is consistent with the present article, not against it. The mechanism that does the work in those papers is that every step sees the full prior transcript, not a summary of it: the personas are serial perspectives adopted inside one continuous context, with no $f_k$ in the data‑processing‑inequality sense. In the vocabulary of §6, that is iterative single‑agent reasoning under a different label. Break the shared transcript — run proposer, skeptic, and judge as isolated sessions exchanging only conclusions — and the same setups empirically degrade, which is itself evidence for the structural claim in §4.

8. The Design Rule

Default to a single, tool‑augmented agent in a continuous context. Fragment only when context, security, cost, latency, auditability, or genuine model heterogeneity force you to — and design the hand‑off to lose as little as possible when you do.

This is a heuristic, not a theorem. Its strength comes from the structural asymmetry between accumulation and lossy hand‑off, not from any specific equation.

9. Reconciling with ANSELM's Living Digital Thread

A fair objection arises here. ANSELM's manifesto values disposable views, open formats, and a living digital thread — the opposite of a single opaque conversation transcript. How can "keep the conversation whole" coexist with "the digital thread must be alive"?

The reconciliation is that the conversation is the workshop, not the archive. The single‑agent iterative loop is where reasoning happens with maximum information density. What ANSELM asks of that loop is that its outputs be continuously externalised as Knowledge Cells, decisions, rationale, and queryable artifacts — exactly the open, human‑readable formats the manifesto calls for. The conversation accumulates; the ecosystem persists.

In other words: keep the conversation whole during reasoning, and crystallise its conclusions into the digital thread. The single agent is not an alternative to the thread — it is the cleanest way to feed it.

10. An Empirical Sketch (Hypothesis, Not Result)

The argument predicts a measurable difference. The example is deliberately drawn from enterprise architecture rather than from software, because the multi‑agent failure mode is most visible in domains the ANSELM audience already lives in — and because the literal mechanism of failure in fragmented BPR engagements is the data‑processing inequality from §4.

Task. Redesign the order‑to‑cash process for a mid‑sized B2B distributor moving from an EDI‑only channel to a mixed EDI + self‑service portal channel, under an explicit constraint set:

SOX‑compliant segregation of duties (no single role spans incompatible activities such as order entry and credit‑limit approval).
Days Sales Outstanding ≤ 45 days.
Credit‑check SLA ≤ 4 hours for new accounts.
No manual re‑keying between systems.
GDPR‑lawful retention of customer and transaction data.
Reuse of the existing SAP S/4 instance; no new core ERP.

The deliverable is a process description (text + BPMN‑equivalent flow) covering the happy path and named edge cases: disputed invoices, partial shipments, returns crossing month‑end close, credit‑hold release authority, and channel reconciliation between EDI and portal orders.

Architectures (all using the same base model).

ITER. One agent in a continuous context, with tools to query the constraint set, draft and revise the process description, and run a constraint checker that returns concrete violations (e.g., "step 4 lets the same role create the order and approve the credit override — SoD violation"). Loop: draft → check → analyse violations → revise.
MULTI‑PIPE. Discovery agent (interviews / reads source material) → Modeller (sees only Discovery's summary) → Compliance reviewer (sees only the model and a violations summary) → Implementation planner (sees only the approved model). Each stage is an isolated conversation.
MULTI‑VOTE. $m$ parallel modellers given the same brief, an aggregator picks one by self‑report.

Metrics (predicted, not measured).

Constraint‑violation count at the final deliverable (objective; the constraint checker is the oracle).
Residual ambiguity: count of underspecified hand‑offs and decision points in the produced process (measured by an automated checker against a fixed rubric — e.g., every decision diamond must name the role, the data, and the outcome paths).
Information loss at each interface: BERTScore (or similar) between the raw upstream content and the summary actually passed downstream. Measured at every hand‑off in MULTI‑PIPE; trivially zero in ITER.
Total token consumption.
Quality score (secondary, judgment‑based): blind human rating by experienced architects, with inter‑rater agreement reported. This is not the primary metric — it is included to characterise what the objective metrics miss.

Predicted (not observed) ordering. ITER < MULTI‑PIPE < MULTI‑VOTE on constraint violations and residual ambiguity (lower is better), with the gap dominated by how much raw discovery material survives to the deciding step. Information‑loss measurements at MULTI‑PIPE hand‑offs should correlate positively with downstream violation counts. MULTI‑VOTE should show high variance across runs — a hallmark of sampling noise rather than diversity.

Honest caveat on the oracle. Unlike a code‑and‑tests example, the oracle here is partial. The constraint checker is objective for the rules listed, but real BPR has dimensions — operational feasibility, organisational fit, change‑management cost — that no checker captures. The experiment therefore tests the article's structural claim about information flow, not the broader claim that ITER produces a "better" redesign in every sense. Constraint satisfaction is the backbone; quality is a secondary, judgment‑based signal.

To restate plainly: no data is reported in this article. The numbers above are predictions derived from §4, not results. The experiment is intended as a follow‑up piece; readers who want the empirical anchor before the argument should treat §10 as a pre‑registration rather than a finding.

11. Conclusion

The "AI micro‑corporation" is an anthropomorphic reflex. It copies human organisational structures without examining whether the underlying agent's nature warrants them. The information‑theoretic picture is unflattering to that copy: hand‑offs are lossy channels, the data processing inequality is unforgiving, and parallel contests of identical models tend to harvest sampling noise rather than genuine diversity.

The corrective is not to ban fragmentation but to demote it. A useful single‑sentence test: fragment when the hand‑off interface already exists in the problem; do not invent hand‑off interfaces to mimic an org chart. Microservice boundaries, security perimeters, context‑window ceilings, cost tiers, sandboxes — these are real interfaces. Intro/body/conclusion of an article is not.

Default to one conversation. Fragment only when something concrete forces you to. Crystallise conclusions into the digital thread as you go. That is the ANSELM posture stated as architecture: not a committee, a conversation.

References

Hammond, L., et al. (2026). Multi‑Agent Risks from Advanced AI. Cooperative AI Foundation.
Cover, T. M., & Thomas, J. A. Elements of Information Theory — for the data processing inequality.
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving Factuality and Reasoning in Language Models through Multi‑Agent Debate. arXiv:2305.14325.
ANSELM — AI‑native Systems Engineering Learning Method. https://anselm.ing