0003. Dual-Model Planning (Echo + Delta)

Date: 2025-11-17 Status: Accepted Deciders: AGX Core Team Tags: architecture, planning, llm, ai, agx

Context

AGX needs to convert natural-language user intent into deterministic, executable JSON plans. This transformation involves two orthogonal cognitive tasks:

Understanding human intent: Interpreting ambiguous, incomplete, variable natural language
Generating executable plans: Producing deterministic, schema-compliant, minimal JSON

Requirements

Correctness: Plans must be valid JSON, matching strict schemas
Determinism: Same intent should produce same plan (or controlled variations)
Safety: No hallucinated tools, no invalid arguments, no unsafe commands
Clarity: User should understand what will happen before execution
Interactivity: Ability to clarify ambiguous requests with user
Auditability: Plans must be inspectable and verifiable

Problem with Single-Model Approach

A single LLM trying to do both tasks simultaneously leads to:

Hallucinations: Making up tools that don't exist
Schema violations: Invalid JSON, missing required fields
Non-determinism: Different outputs for same input
Brittle planning: Small prompt changes cause large plan changes
Mixed concerns: Reasoning and formatting entangled

Decision

We will use a two-model architecture: Echo (intent interpreter) + Delta (plan compiler).

This mirrors classic compiler design:

Source Code → Parser → IR → Optimizer → Compiler → Machine Code
User Intent → Echo → Structured Intent → Delta → JSON Plan

Echo Model (Intent Interpreter)

Role: Understand and clarify human intent

Characteristics:

Conversational, flexible
High reasoning bandwidth
Tolerant of ambiguity
Temperature: 0.5-0.7
May use chain-of-thought
Interacts with user to clarify

Input: Natural language user intent Output: Structured Intent (intermediate representation)

Example:

User: "Remove duplicates and find lines mentioning invoices"

Echo Output (Structured Intent):
{
  "task": "text-filtering",
  "steps": [
    {"action": "remove-duplicates"},
    {"action": "grep", "pattern": "invoice"}
  ],
  "constraints": [],
  "output": "stdout"
}

Delta Model (Plan Compiler)

Role: Generate deterministic executable plans

Characteristics:

Strict, deterministic
Schema-compliant output
Temperature: 0.0
No hallucinations
Minimal, safe plans
Tool registry-driven

Input: Structured Intent Output: Valid JSON Plan (per /specs/plan.schema.json)

Example:

Delta Output (AGX Plan):
{
  "plan_id": "uuid-5678",
  "plan_description": "Filter text: deduplicate and find invoices",
  "tasks": [
    {
      "task_number": 1,
      "command": "uniq",
      "args": [],
      "timeout_secs": 30
    },
    {
      "task_number": 2,
      "command": "grep",
      "args": ["invoice"],
      "input_from_task": 1,
      "timeout_secs": 30
    }
  ]
}

Alternatives Considered

Option 1: Single Large Model (GPT-4 class)

Pros:

Simpler architecture (one model)
Can handle both understanding and formatting
Fewer moving parts

Cons:

Still prone to hallucinations
Non-deterministic (even at temp=0)
Schema violations require retry loops
Expensive to run
Difficult to fine-tune for both tasks
Mixed reasoning and formatting

Option 2: Prompt Engineering Only (Single Model + Complex Prompts)

Pros:

No architectural complexity
Fast to iterate
Works with existing models

Cons:

Fragile (prompt changes break plans)
Still non-deterministic
Hallucinations persist
Hard to guarantee schema compliance
Difficult to maintain as requirements grow

Option 3: Rule-Based Planning (No LLM)

Pros:

Perfectly deterministic
No hallucinations
Fast execution
Predictable behavior

Cons:

Cannot handle natural language
Requires precise, structured input
Inflexible (no intent understanding)
Poor user experience

Option 4: Three-Model Pipeline (Intent → Reasoning → Planning → Verification)

Pros:

Even more separation of concerns
Dedicated verifier model

Cons:

Overly complex
Higher latency (3 LLM calls)
More points of failure
Harder to maintain

Decision Rationale

Dual-model chosen because:

Separation of concerns: Each model does one thing well
Proven pattern: Mirrors compiler architecture (parser → compiler)
Determinism: Delta at temp=0 produces consistent plans
Safety: Delta validates against tool registry, no hallucinations
Maintainability: Can upgrade/fine-tune Echo and Delta independently
User experience: Echo handles clarification naturally
Industry precedent: Microsoft Planner/Executor, Anthropic Toolformer, Google Thinking/Acting

Compared to alternatives:

vs single model: More reliable, less hallucination
vs prompt engineering: More maintainable, more deterministic
vs rule-based: Handles natural language
vs three-model: Simpler, lower latency

Consequences

Positive

Correctness: Delta produces schema-valid plans 99%+ of time
Determinism: Same structured intent → same plan
Safety: Tool registry prevents hallucinated tools
Clarity: Structured Intent is human-readable intermediate state
Modularity: Can swap Echo or Delta models independently
Fine-tuning: Can train small specialized models for each role
Debugging: Clear separation makes issues easier to diagnose

Negative

Complexity: Two models to manage instead of one
Latency: Two sequential LLM calls (mitigated by using fast models)
Development overhead: Need to define Structured Intent schema
More moving parts: Two models can fail independently

Neutral

Model selection: Need to choose appropriate models for each role
Structured Intent design: Need stable IR schema (but provides benefits)

Implementation Notes

Phase 1: Foundation

Echo Model Options:

DeepSeek-R1-Distill-Qwen-1.5B (reasoning-capable, small)
VibeThinker-1.5B (optimized for intent understanding)
Qwen2.5-3B-Instruct (strong reasoning)

Delta Model Options:

Phi-4-Mini (3.8B, strong schema following)
Qwen2.5-1.5B-Instruct (fast, good at structured output)
Llama-3.2-3B-Instruct (reliable, well-tested)

Structured Intent Schema:

Define stable JSON schema for Echo → Delta contract
Version the schema (allow evolution)
Document in /specs/structured-intent.schema.json

Phase 2: Fine-Tuning (Tinker)

Echo Fine-Tuning Corpus:

Messy user intent → structured intent examples
Multi-turn clarification dialogues
Ambiguity resolution patterns

Delta Fine-Tuning Corpus:

Structured intent → AGX plan examples
Tool registry-driven planning
Schema compliance examples

Phase 3: Optimization

Caching:

Cache Echo outputs for common intents
Cache Delta plans for common structured intents

Fast Path:

Simple intents bypass Echo (direct to Delta)
Deterministic mode skips Echo entirely

Tool Registry Integration

Delta queries tool registry via --describe contracts:

$ agx-ocr --describe
{
  "name": "agx-ocr",
  "version": "0.1.0",
  "capabilities": ["document-ocr", "image-to-text"],
  "inputs": ["image/png", "image/jpeg", "application/pdf"],
  "outputs": ["text/plain", "application/json"]
}

Related to execution architecture: Plans generated here are executed by AGW
Related to AU contracts: Delta uses --describe to discover capabilities
Future ADR: Structured Intent schema versioning

References

AGX Dual-Model Planning Documentation
Anthropic: Toolformer Pattern
Microsoft: Planner-Executor Architecture
Stanford ACE: Agent-Computer Environment
Job Schema - Delta output format
Agentic Unit Spec - Tool contract spec

Context​

Requirements​

Problem with Single-Model Approach​

Decision​

Echo Model (Intent Interpreter)​

Delta Model (Plan Compiler)​

Alternatives Considered​

Option 1: Single Large Model (GPT-4 class)​

Option 2: Prompt Engineering Only (Single Model + Complex Prompts)​

Option 3: Rule-Based Planning (No LLM)​

Option 4: Three-Model Pipeline (Intent → Reasoning → Planning → Verification)​

Decision Rationale​

Consequences​

Positive​

Negative​

Neutral​

Implementation Notes​

Phase 1: Foundation​

Phase 2: Fine-Tuning (Tinker)​

Phase 3: Optimization​

Tool Registry Integration​

Related Decisions​

References​

Context

Requirements

Problem with Single-Model Approach

Decision

Echo Model (Intent Interpreter)

Delta Model (Plan Compiler)

Alternatives Considered

Option 1: Single Large Model (GPT-4 class)

Option 2: Prompt Engineering Only (Single Model + Complex Prompts)

Option 3: Rule-Based Planning (No LLM)

Option 4: Three-Model Pipeline (Intent → Reasoning → Planning → Verification)

Decision Rationale

Consequences

Positive

Negative

Neutral

Implementation Notes

Phase 1: Foundation

Phase 2: Fine-Tuning (Tinker)

Phase 3: Optimization

Tool Registry Integration

Related Decisions

References