Artemis

Inference Providers

How Artemis routes requests across eight LLM backends, with automatic fallback.

The @artemis/inference package is the routing layer that sits between the agent and the actual models. You configure which providers are enabled, give it your API keys, and it handles the rest—including automatic failover if a provider returns an error.

The Router

InferenceRouter is the single entry point. It accepts an InferenceRequest, dispatches to the correct InferenceProvider, and streams InferenceEvent chunks back to the caller via an AsyncGenerator.

If the primary provider fails, the router walks the fallbackOrder list you define in RoutingConfig and retries with the next available provider. You never have to write that retry logic yourself.

Providers

Eight provider adapters ship out of the box. Each implements the same InferenceProvider interface, so they're all interchangeable from the router's perspective.

ProviderClassNotes
OllamaOllamaProviderLocal; always enabled; no API key needed
OpenAIOpenAIProviderGPT-4o and variants via the official SDK
AnthropicAnthropicProviderClaude via the official SDK
Google GeminiGoogleProviderGemini via @google/generative-ai
Vertex AIVertexAIProviderGemini via GCP; see Vertex AI below
OpenRouterOpenRouterProviderOpenAI-compatible proxy to 100+ models
Gemini CLIGeminiCliProviderShells out to the gemini CLI binary
LM StudioLMStudioProviderLocal LM Studio via OpenAI-compatible API

Configuration

getDefaultConfig() reads your environment at startup and produces a RoutingConfig. You only need to set the keys for providers you want active.

# Google Gemini
GOOGLE_API_KEY=your-gemini-key

# OpenAI
OPENAI_API_KEY=your-openai-key

# Anthropic
ANTHROPIC_API_KEY=your-anthropic-key

# OpenRouter
OPENROUTER_API_KEY=your-openrouter-key

# Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434

# LM Studio (local)
LMSTUDIO_BASE_URL=http://localhost:1234

Any provider without a key or base URL is automatically disabled and skipped during fallback resolution.

Vertex AI

Vertex AI is the GCP-hosted variant of Gemini. It supports three authentication methods—the router tries them in order and uses whichever one is available:

  1. API key — set VERTEX_AI_API_KEY
  2. Service account / project — set VERTEX_AI_PROJECT (and optionally VERTEX_AI_LOCATION, defaulting to us-central1)
  3. Application Default Credentials (ADC) — no env var needed; just run gcloud auth application-default login or attach a service account to your environment
# Option A — API key
VERTEX_AI_API_KEY=your-vertex-key

# Option B — project-based
VERTEX_AI_PROJECT=my-gcp-project
VERTEX_AI_LOCATION=us-central1

Vertex AI models are always listed in the /api/models endpoint regardless of which auth method you're using. If none of the three methods is configured, the provider falls through to the next entry in your fallbackOrder.

Available Vertex AI Models

gemini-2.5-pro-preview-05-06
gemini-2.5-flash-preview-04-17
gemini-2.5-flash-thinking
gemini-2.5-pro-exp
gemini-2.0-flash
gemini-2.0-flash-thinking-exp-01-21
gemini-2.0-pro-exp-02-05
gemini-3.0-flash-preview
gemini-3.0-pro-preview
gemini-1.5-pro
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.0-pro
gemini-1.0-ultra

Gemini CLI Provider

If you have the gemini CLI installed and authenticated but no API key, GeminiCliProvider provides a zero-configuration option by shelling out to the binary. It doesn't require any env vars—just the binary in your PATH.

# Install the Gemini CLI
npm i -g @google/gemini-cli
gemini auth

Streaming Events

Every provider emits the same InferenceEvent union over the generator:

Event typePayloadDescription
token{ text }A streamed output token
reasoning{ text }Chain-of-thought reasoning text
monologue{ text }Internal reflection from MonologueGenerator
tool_call{ id, name, arguments }Model requested a tool
tool_result{ id, result }Tool execution result
done{ usage }Stream complete with token usage
error{ message }Provider error
context_info{ text }Contextual metadata
pausedGeneration paused (e.g. waiting for tool)