Inference Providers
How Artemis routes requests across eight LLM backends, with automatic fallback.
The @artemis/inference package is the routing layer that sits between the agent and the actual models. You configure which providers are enabled, give it your API keys, and it handles the rest—including automatic failover if a provider returns an error.
The Router
InferenceRouter is the single entry point. It accepts an InferenceRequest, dispatches to the correct InferenceProvider, and streams InferenceEvent chunks back to the caller via an AsyncGenerator.
If the primary provider fails, the router walks the fallbackOrder list you define in RoutingConfig and retries with the next available provider. You never have to write that retry logic yourself.
Providers
Eight provider adapters ship out of the box. Each implements the same InferenceProvider interface, so they're all interchangeable from the router's perspective.
| Provider | Class | Notes |
|---|---|---|
| Ollama | OllamaProvider | Local; always enabled; no API key needed |
| OpenAI | OpenAIProvider | GPT-4o and variants via the official SDK |
| Anthropic | AnthropicProvider | Claude via the official SDK |
| Google Gemini | GoogleProvider | Gemini via @google/generative-ai |
| Vertex AI | VertexAIProvider | Gemini via GCP; see Vertex AI below |
| OpenRouter | OpenRouterProvider | OpenAI-compatible proxy to 100+ models |
| Gemini CLI | GeminiCliProvider | Shells out to the gemini CLI binary |
| LM Studio | LMStudioProvider | Local LM Studio via OpenAI-compatible API |
Configuration
getDefaultConfig() reads your environment at startup and produces a RoutingConfig. You only need to set the keys for providers you want active.
# Google Gemini
GOOGLE_API_KEY=your-gemini-key
# OpenAI
OPENAI_API_KEY=your-openai-key
# Anthropic
ANTHROPIC_API_KEY=your-anthropic-key
# OpenRouter
OPENROUTER_API_KEY=your-openrouter-key
# Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434
# LM Studio (local)
LMSTUDIO_BASE_URL=http://localhost:1234Any provider without a key or base URL is automatically disabled and skipped during fallback resolution.
Vertex AI
Vertex AI is the GCP-hosted variant of Gemini. It supports three authentication methods—the router tries them in order and uses whichever one is available:
- API key — set
VERTEX_AI_API_KEY - Service account / project — set
VERTEX_AI_PROJECT(and optionallyVERTEX_AI_LOCATION, defaulting tous-central1) - Application Default Credentials (ADC) — no env var needed; just run
gcloud auth application-default loginor attach a service account to your environment
# Option A — API key
VERTEX_AI_API_KEY=your-vertex-key
# Option B — project-based
VERTEX_AI_PROJECT=my-gcp-project
VERTEX_AI_LOCATION=us-central1Vertex AI models are always listed in the /api/models endpoint regardless of which auth method you're using. If none of the three methods is configured, the provider falls through to the next entry in your fallbackOrder.
Available Vertex AI Models
gemini-2.5-pro-preview-05-06
gemini-2.5-flash-preview-04-17
gemini-2.5-flash-thinking
gemini-2.5-pro-exp
gemini-2.0-flash
gemini-2.0-flash-thinking-exp-01-21
gemini-2.0-pro-exp-02-05
gemini-3.0-flash-preview
gemini-3.0-pro-preview
gemini-1.5-pro
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.0-pro
gemini-1.0-ultraGemini CLI Provider
If you have the gemini CLI installed and authenticated but no API key, GeminiCliProvider provides a zero-configuration option by shelling out to the binary. It doesn't require any env vars—just the binary in your PATH.
# Install the Gemini CLI
npm i -g @google/gemini-cli
gemini authStreaming Events
Every provider emits the same InferenceEvent union over the generator:
| Event type | Payload | Description |
|---|---|---|
token | { text } | A streamed output token |
reasoning | { text } | Chain-of-thought reasoning text |
monologue | { text } | Internal reflection from MonologueGenerator |
tool_call | { id, name, arguments } | Model requested a tool |
tool_result | { id, result } | Tool execution result |
done | { usage } | Stream complete with token usage |
error | { message } | Provider error |
context_info | { text } | Contextual metadata |
paused | — | Generation paused (e.g. waiting for tool) |