Models & Providers
Dynamo is provider-agnostic. Use any LLM from any provider.
Model capabilities (context window, tool support, reasoning, vision) are auto-detected at runtime via live probes against each provider — no hardcoded model→capability tables. Newly-released models work immediately, and context windows are re-checked on every model switch since providers expand them over time.
Switching Models
Type /model to open the model picker, or switch directly:
/model # opens a picker with all available models
/model opus # switch to Claude Opus
/model mini # switch to GPT-5 MiniYour choice is saved and persists across sessions.
You can also use the full provider/model format: /model anthropic/claude-sonnet-4-6, /model google/gemini-2.5-flash, /model openrouter/anthropic/claude-opus-4-6, or /model ollama/qwen3.5:35b.
Built-in Aliases
| Alias | Model |
|---|---|
opus | Claude Opus 4.7 |
sonnet | Claude Sonnet 4.6 |
haiku | Claude Haiku 4.5 |
nano | GPT-5.4 Nano |
mini | GPT-5.4 Mini |
or-opus | OpenRouter Claude Opus |
or-sonnet | OpenRouter Claude Sonnet |
or-mini | OpenRouter GPT-5.4 Mini |
or-llama | OpenRouter Llama 4 Scout |
g-pro | Gemini 2.5 Pro |
g-flash | Gemini 2.5 Flash |
g-3 | Gemini 3 Pro Preview |
Anthropic (Claude)
Set ANTHROPIC_API_KEY to use Claude models directly. Anthropic is the recommended provider for Dynamo — best tool use, streaming, and extended thinking support.
/model opus # Claude Opus 4.7 — best reasoning
/model sonnet # Claude Sonnet 4.6 — fast + capable
/model haiku # Claude Haiku 4.5 — fastest, cheapestAll Claude models support streaming, tool calling, and extended thinking. Opus 4.7 has a 1M token context window.
OpenAI (GPT)
Set OPENAI_API_KEY to use GPT models directly. Supports the Chat Completions and Responses APIs, web search, and reasoning models.
/model mini # GPT-5 Mini — strong all-rounder
/model nano # GPT-5.4 Nano — fast and cheapGPT-5.x models support streaming, tool calling, web search (via Responses API), and extended thinking.
OpenRouter (200+ Models)
OpenRouter gives you access to 200+ models through a single API key. Set OPENROUTER_API_KEY and use the tabbed model picker to browse and pin models:
/model # Tab 1: direct providers + pinned models
# Tab 2: searchable OpenRouter catalog
/model openrouter/anthropic/claude-opus-4-6 # switch directly
/model or-opus # use an aliasIn the OpenRouter tab, type to search and press Space to pin models you use often. Pinned models appear in the Direct tab for quick access.
Google Gemini
Set GOOGLE_API_KEY to use Gemini models directly:
/model g-flash # Gemini 2.5 Flash
/model g-pro # Gemini 2.5 Pro
/model g-3 # Gemini 3 Pro Preview
/model google/gemini-3-pro-preview # full model IDGemini supports streaming, function calling, and extended thinking (2.5+ models). 1M token context window.
Local Models with Ollama
Dynamo works with Ollama for fully local, offline AI:
# Install Ollama and pull a model
ollama pull qwen3.5:35b
# Dynamo auto-detects Ollama
dynamo
/model ollama/qwen3.5:35bLocal models support tool calling, extended thinking (Qwen3, DeepSeek-R1), and streaming. No API key needed — everything runs on your machine.
Custom Providers
Add any OpenAI-compatible endpoint in dynamo.yaml:
ai:
providers:
deepseek:
type: "openai-compatible"
base_url: "https://api.deepseek.com/v1"
api_key_env: "DEEPSEEK_API_KEY"Then use /model deepseek/deepseek-chat to switch.
Reasoning Levels
Use /effort to control how much the model thinks before responding:
| Level | What it does |
|---|---|
low | Quick responses, minimal reasoning |
medium | Balanced (default) |
high | Extended thinking — model shows its reasoning |
max | Maximum reasoning budget |
Extended thinking is supported by Claude, GPT (via Responses API), Gemini (2.5+), OpenRouter models with reasoning support, and Ollama models like Qwen3 and DeepSeek-R1. The thinking output appears in a dimmed block above the response.
Token Display
After each response, Dynamo shows token usage:
claude-opus-4-6 · 348 · 187 · 412 · 12,247 | Glyph | Meaning |
|---|---|
| | Input tokens — fresh tokens not served from cache |
| | Output tokens |
| | Reasoning tokens (reasoning models only — subtracted from output count) |
| | Cached input tokens — prompt content served from cache instead of reprocessed |
In terminals without Nerd Font support, Dynamo falls back to plain Unicode: ↑ ↓ ◇ ✶.
Total prompt size is input + cached. With aggressive prompt caching, most of your conversation history is served from cache on every turn — only the new message and tool results count as fresh input tokens. In the example above, 12,247 of 12,595 total tokens (97%) were cache hits.