For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
API types
Supported LLM API endpoint types and route configurations
Agentgateway natively supports multiple LLM API endpoint types. These are automatically exposed on the gateway, and translated as appropriate based on the provider.
The following API types have dedicated guides:
- Chat completions: The OpenAI
/v1/chat/completionsendpoint. This is the most widely used API type for text generation and chat applications. - Responses: The OpenAI
/v1/responsesendpoint for stateful, multi-step model interactions. - Messages: The Anthropic
/v1/messagesendpoint for Claude models. - Embeddings: The OpenAI-compatible
/v1/embeddingsendpoint for creating vector representations of text. - Realtime: The OpenAI Realtime API for low-latency, streaming voice and text interactions over WebSockets.
- Rerank: The Cohere-compatible
/v2/rerankendpoint for ranking documents by relevance to a query. - Models: The OpenAI-compatible
/v1/modelsendpoint for listing available models. - Token count: The Anthropic
/v1/messages/count_tokensendpoint for estimating input tokens. - Passthrough: Forwards requests directly to the backend provider without transformation.
Chat completions
Send chat completion requests through agentgateway using the OpenAI Chat Completions API.
Responses
Send requests through agentgateway using the OpenAI Responses API.
Messages
Send requests through agentgateway using the Anthropic Messages API.
Embeddings
Send embedding requests through agentgateway using the OpenAI-compatible Embeddings API.
OpenAI Realtime
Proxy OpenAI Realtime API WebSocket traffic and track token usage.
Rerank
Send rerank requests through agentgateway using the Cohere-compatible Rerank API.
Passthrough
Forward requests to the upstream provider without transformation.
Models
List available models through agentgateway using the OpenAI-compatible Models API.
Token count
Count tokens through agentgateway using the Anthropic Messages token-count API.