Inference Providers

The @artemis/inference package is the routing layer that sits between the agent and the actual models. You configure which providers are enabled, give it your API keys, and it handles the rest—including automatic failover if a provider returns an error.

The Router

InferenceRouter is the single entry point. It accepts an InferenceRequest, dispatches to the correct InferenceProvider, and streams InferenceEvent chunks back to the caller via an AsyncGenerator.

If the primary provider fails, the router walks the fallbackOrder list you define in RoutingConfig and retries with the next available provider. You never have to write that retry logic yourself.

Providers

Eight provider adapters ship out of the box. Each implements the same InferenceProvider interface, so they're all interchangeable from the router's perspective.

Provider	Class	Notes
Ollama	`OllamaProvider`	Local; always enabled; no API key needed
OpenAI	`OpenAIProvider`	GPT-4o and variants via the official SDK
Anthropic	`AnthropicProvider`	Claude via the official SDK
Google Gemini	`GoogleProvider`	Gemini via `@google/generative-ai`
Vertex AI	`VertexAIProvider`	Gemini via GCP; see Vertex AI below
OpenRouter	`OpenRouterProvider`	OpenAI-compatible proxy to 100+ models
Gemini CLI	`GeminiCliProvider`	Shells out to the `gemini` CLI binary
LM Studio	`LMStudioProvider`	Local LM Studio via OpenAI-compatible API

Configuration

getDefaultConfig() reads your environment at startup and produces a RoutingConfig. You only need to set the keys for providers you want active.

# Google Gemini
GOOGLE_API_KEY=your-gemini-key

# OpenAI
OPENAI_API_KEY=your-openai-key

# Anthropic
ANTHROPIC_API_KEY=your-anthropic-key

# OpenRouter
OPENROUTER_API_KEY=your-openrouter-key

# Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434

# LM Studio (local)
LMSTUDIO_BASE_URL=http://localhost:1234

Any provider without a key or base URL is automatically disabled and skipped during fallback resolution.

Vertex AI

Vertex AI is the GCP-hosted variant of Gemini. It supports three authentication methods—the router tries them in order and uses whichever one is available:

API key — set VERTEX_AI_API_KEY
Service account / project — set VERTEX_AI_PROJECT (and optionally VERTEX_AI_LOCATION, defaulting to us-central1)
Application Default Credentials (ADC) — no env var needed; just run gcloud auth application-default login or attach a service account to your environment

# Option A — API key
VERTEX_AI_API_KEY=your-vertex-key

# Option B — project-based
VERTEX_AI_PROJECT=my-gcp-project
VERTEX_AI_LOCATION=us-central1

Vertex AI models are always listed in the /api/models endpoint regardless of which auth method you're using. If none of the three methods is configured, the provider falls through to the next entry in your fallbackOrder.

Available Vertex AI Models

gemini-2.5-pro-preview-05-06
gemini-2.5-flash-preview-04-17
gemini-2.5-flash-thinking
gemini-2.5-pro-exp
gemini-2.0-flash
gemini-2.0-flash-thinking-exp-01-21
gemini-2.0-pro-exp-02-05
gemini-3.0-flash-preview
gemini-3.0-pro-preview
gemini-1.5-pro
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.0-pro
gemini-1.0-ultra

Gemini CLI Provider

If you have the gemini CLI installed and authenticated but no API key, GeminiCliProvider provides a zero-configuration option by shelling out to the binary. It doesn't require any env vars—just the binary in your PATH.

# Install the Gemini CLI
npm i -g @google/gemini-cli
gemini auth

Streaming Events

Every provider emits the same InferenceEvent union over the generator:

Event type	Payload	Description
`token`	`{ text }`	A streamed output token
`reasoning`	`{ text }`	Chain-of-thought reasoning text
`monologue`	`{ text }`	Internal reflection from MonologueGenerator
`tool_call`	`{ id, name, arguments }`	Model requested a tool
`tool_result`	`{ id, result }`	Tool execution result
`done`	`{ usage }`	Stream complete with token usage
`error`	`{ message }`	Provider error
`context_info`	`{ text }`	Contextual metadata
`paused`	—	Generation paused (e.g. waiting for tool)