Skip to content
🎯 New workshop: Govern AI Costs in Real Time — Hands-On with agentgateway agentgateway has joined the Agentic AI FoundationLearn more

For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.

Page as Markdown

Ollama

Configure agentgateway to route LLM traffic to Ollama for local model inference

Configure Ollama to serve local models through agentgateway. Agentgateway 1.3 includes the first-class ollama provider and automatically uses http://localhost:11434/v1 unless you override it.

Before you begin

  1. Install the agentgateway binary.
  2. Install Ollama.

  3. Make sure that you have at least one model pulled locally.

    ollama list

    If not, pull a model.

    ollama pull llama3.2

Configure agentgateway

Create a configuration file that routes requests to your local Ollama instance.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
  port: 3000
  models:
  - name: "*"
    provider: ollama
    params:
      model: llama3.2

Review the following table to understand this configuration.

SettingDescription
provider: ollamaUses the built-in Ollama provider shortcut instead of the older openAI compatibility path.
params.modelSets the default Ollama model. The model must already exist in your local Ollama instance.
params.baseUrlOptional override for non-default Ollama endpoints. If omitted, agentgateway uses http://localhost:11434/v1.
name: "*"Matches any requested model name, so clients can request any model that Ollama has pulled.

If Ollama is running somewhere other than http://localhost:11434/v1, override the base URL instead of using host overrides.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
  port: 3000
  models:
  - name: "*"
    provider: ollama
    params:
      model: llama3.2
      baseUrl: http://192.168.1.20:11434/v1

Start agentgateway:

agentgateway -f config.yaml

Test the configuration

Send a request to verify that agentgateway routes to Ollama. The model name in the request must match a model you already pulled with ollama pull.

curl http://localhost:3000/v1/chat/completions \
  -H "content-type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {
        "role": "user",
        "content": "Explain Ollama in one sentence."
      }
    ]
  }' | jq

Troubleshooting

Connection refused

What’s happening:

Requests to agentgateway return a 503 response or a connection refused error.

Why it’s happening:

Ollama is not running, is listening on a different address, or params.baseUrl points to the wrong endpoint.

How to fix it:

  1. Verify Ollama is reachable directly.

    curl http://localhost:11434/api/version
  2. If Ollama is not running, start it.

    ollama serve
  3. If you set params.baseUrl, make sure it includes Ollama’s /v1 prefix.

Model not found

What’s happening:

The response returns a model not found error.

Why it’s happening:

The requested model has not been pulled into your local Ollama instance.

How to fix it:

  1. List available models.

    ollama list
  2. Pull the missing model.

    ollama pull llama3.2
Was this page helpful?
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.