Docs/Core Concepts/Models & Providers

Models & Providers

Dynamo is provider-agnostic. Use any LLM from any provider.

Model capabilities (context window, tool support, reasoning, vision) are auto-detected at runtime via live probes against each provider — no hardcoded model→capability tables. Newly-released models work immediately, and context windows are re-checked on every model switch since providers expand them over time.

Switching Models

Type /model to open the model picker, or switch directly:

bash
/model              # opens a picker with all available models
/model opus         # switch to Claude Opus
/model mini         # switch to GPT-5 Mini

Your choice is saved and persists across sessions.

You can also use the full provider/model format: /model anthropic/claude-sonnet-4-6, /model google/gemini-2.5-flash, /model openrouter/anthropic/claude-opus-4-6, or /model ollama/qwen3.5:35b.

Built-in Aliases

AliasModel
opusClaude Opus 4.7
sonnetClaude Sonnet 4.6
haikuClaude Haiku 4.5
nanoGPT-5.4 Nano
miniGPT-5.4 Mini
or-opusOpenRouter Claude Opus
or-sonnetOpenRouter Claude Sonnet
or-miniOpenRouter GPT-5.4 Mini
or-llamaOpenRouter Llama 4 Scout
g-proGemini 2.5 Pro
g-flashGemini 2.5 Flash
g-3Gemini 3 Pro Preview

Anthropic (Claude)

Set ANTHROPIC_API_KEY to use Claude models directly. Anthropic is the recommended provider for Dynamo — best tool use, streaming, and extended thinking support.

bash
/model opus      # Claude Opus 4.7 — best reasoning
/model sonnet    # Claude Sonnet 4.6 — fast + capable
/model haiku     # Claude Haiku 4.5 — fastest, cheapest

All Claude models support streaming, tool calling, and extended thinking. Opus 4.7 has a 1M token context window.

OpenAI (GPT)

Set OPENAI_API_KEY to use GPT models directly. Supports the Chat Completions and Responses APIs, web search, and reasoning models.

bash
/model mini    # GPT-5 Mini — strong all-rounder
/model nano    # GPT-5.4 Nano — fast and cheap

GPT-5.x models support streaming, tool calling, web search (via Responses API), and extended thinking.

OpenRouter (200+ Models)

OpenRouter gives you access to 200+ models through a single API key. Set OPENROUTER_API_KEY and use the tabbed model picker to browse and pin models:

bash
/model                                       # Tab 1: direct providers + pinned models
                                             # Tab 2: searchable OpenRouter catalog
/model openrouter/anthropic/claude-opus-4-6  # switch directly
/model or-opus                               # use an alias

In the OpenRouter tab, type to search and press Space to pin models you use often. Pinned models appear in the Direct tab for quick access.

Google Gemini

Set GOOGLE_API_KEY to use Gemini models directly:

bash
/model g-flash                      # Gemini 2.5 Flash
/model g-pro                        # Gemini 2.5 Pro
/model g-3                          # Gemini 3 Pro Preview
/model google/gemini-3-pro-preview  # full model ID

Gemini supports streaming, function calling, and extended thinking (2.5+ models). 1M token context window.

Local Models with Ollama

Dynamo works with Ollama for fully local, offline AI:

bash
# Install Ollama and pull a model
ollama pull qwen3.5:35b

# Dynamo auto-detects Ollama
dynamo
/model ollama/qwen3.5:35b

Local models support tool calling, extended thinking (Qwen3, DeepSeek-R1), and streaming. No API key needed — everything runs on your machine.

Custom Providers

Add any OpenAI-compatible endpoint in dynamo.yaml:

yaml
ai:
  providers:
    deepseek:
      type: "openai-compatible"
      base_url: "https://api.deepseek.com/v1"
      api_key_env: "DEEPSEEK_API_KEY"

Then use /model deepseek/deepseek-chat to switch.

Reasoning Levels

Use /effort to control how much the model thinks before responding:

LevelWhat it does
lowQuick responses, minimal reasoning
mediumBalanced (default)
highExtended thinking — model shows its reasoning
maxMaximum reasoning budget

Extended thinking is supported by Claude, GPT (via Responses API), Gemini (2.5+), OpenRouter models with reasoning support, and Ollama models like Qwen3 and DeepSeek-R1. The thinking output appears in a dimmed block above the response.

Token Display

After each response, Dynamo shows token usage:

claude-opus-4-6 · 348 󰄝 · 187 󰄠 · 412 󰧑 · 12,247 
GlyphMeaning
󰄝Input tokens — fresh tokens not served from cache
󰄠Output tokens
󰧑Reasoning tokens (reasoning models only — subtracted from output count)
Cached input tokens — prompt content served from cache instead of reprocessed

In terminals without Nerd Font support, Dynamo falls back to plain Unicode: .

Total prompt size is input + cached. With aggressive prompt caching, most of your conversation history is served from cache on every turn — only the new message and tool results count as fresh input tokens. In the example above, 12,247 of 12,595 total tokens (97%) were cache hits.