Agent Model Experimentation
ops · 15 min
# Agent Model Experimentation Swap which model powers a given agent and measure the result. ## When to run - Costs spiked. Suspect a cheaper model still meets quality. - Quality dropped. Suspect advisor/full mode fixes it. - New model released (e.g. `claude-haiku-4-5`). ## Steps 1. **Snapshot baseline** — `/automation/agents` → select agent → scroll to Recent Runs. Note last-10 success rate + avg cost. 2. **Change mode** — `/automation/settings` → Agents tab → Edit the agent → Mode dropdown: eco / standard / advisor / full. 3. **Flush cache** — no action needed. Provider-resolver reads AgentConfig on every run. 4. **Run 10 trials** — trigger the agent 10 times with representative inputs. - For Lead Scorer: `bun run scripts/test-lead-scorer.ts` (10-run harness). 5. **Compare** — check Recent Runs; compute new success rate + avg cost + latency. 6. **Decide**: - Success rate dropped >5%: revert. - Cost cut ≥30% and rate within 2%: keep. - Mixed: run another 10. ## Mode cheat sheet | Mode | Model | Use when | |------|-------|----------| | eco | claude-haiku-4-5 | High volume, simple classification, deterministic tasks | | standard | claude-sonnet-4-6 | Default. Drafts, emails, routing logic | | advisor | claude-sonnet-4-6 | Strategic suggestions (orchestrator tick) | | full | claude-opus-4-7 | Complex reasoning, multi-skill orchestration | | custom | Provider override | Bring-your-own config | ## Rollback Switch mode back in `/automation/settings`. Takes effect on next run.