Release notes

Review the release notes for agentgateway standalone.

For more details, check out the release blog, or review the GitHub release notes in the agentgateway repository.

🔥 Breaking changes

`agctl` commands reorganized under `proxy` and `controller`

The experimental agctl CLI now groups its inspection and tracing commands under the proxy parent command, and adds new commands for log-level management and version information. Update any scripts or automation that call the previous top-level commands.

Before:

agctl config all --file /tmp/agw-dump.json -o yaml
agctl trace --local --port 3000 -- http://example.com/headers

Now:

agctl proxy config all --file /tmp/agw-dump.json -o yaml
agctl proxy trace --local --port 3000 -- http://example.com/headers

The reorganization also introduces the following new capabilities:

agctl version prints version information for the agctl CLI.
agctl proxy log and agctl controller log get or set log levels at runtime for agentgateway running in Kubernetes.

For more information, see Install agctl, Inspect agentgateway configuration, Trace requests with agctl, and the agctl CLI reference.

🌟 New features

New UI

A refreshed UI exposes the new LLM capabilities through LLM, MCP, and traffic-native views, aligned with the new model-based routing model. Configure providers, models, costs, and guardrails and inspect MCP and traffic configuration from one place.

Standalone LLM enhancements

This release brings a large set of improvements to the standalone LLM experience, expanding first-class support for model-based routing, where a single endpoint such as /v1/chat/completions is exposed and the model is specified in the request body.

Virtual models: Define a public model name that routes to one or more concrete models using weighted, failover, or conditional (CEL-based) routing, and mark concrete models as public or internal.
Shared providers: Define a named provider once with shared defaults and reference it from multiple models, with per-model overrides.
Model-cost catalog: Supply a model-cost catalog so the gateway computes real per-request cost, surfaced through CEL, logs, traces, and metrics. Catalogs load from files or inline, and agctl costs import can generate one.
13 new first-class providers: Mistral, Hugging Face, Cohere, Groq, Fireworks, DeepSeek, xAI, Together AI, OpenRouter, Cerebras, DeepInfra, Baseten, and Ollama can be selected by name with sensible defaults, and baseUrl replaces the older host and path override fields. For more information, see the LLM providers section.
Custom provider: Access providers without built-in support directly, rather than approximating them with the OpenAI provider and a custom base_url. A providerOverride tags a custom backend with a known provider name so cost and telemetry attribute correctly.
Serve LLM over TLS: The standalone LLM listener can now serve HTTPS directly.
CORS for the local LLM listener: Configure CORS on the LLM listener, including correct handling of non-matching requests and 404s.
Share a port for MCP and LLM: Serve MCP and LLM traffic on a single shared listener port.

Guardrails

Shared guardrails: Define guardrails as a shared top-level resource that applies to all models, merged with per-model guardrails.
Streaming guardrails: Optionally run guardrails on streaming (SSE) and realtime responses.
Webhook failureMode: Webhook guardrails support fail-open or fail-closed behavior.

ExtMCP: MCP-aware external auth and processing

External authorization and external processing integrations can now make decisions using MCP request context, such as the tool being called and its arguments, rather than only generic HTTP metadata. For more information, see the MCP guardrail docs.

External processing enhancements

Per-phase processing modes: Control whether headers, body, and trailers are sent to the ext_proc server for each phase, choose how the body is delivered, and optionally allow the ext_proc server to override the mode.
ImmediateResponse from body phases: An ext_proc server can now return an ImmediateResponse from the request body and response body phases and have it returned to the client.

Request buffering

A new buffering policy can accumulate request and response bodies in memory before forwarding, with configurable size limits. For more information, see Body buffering.

Authentication and authorization

Per-model authorization: LLM authorization can be configured per model, and /v1/models listings are gated by authorization.
API key permissive mode: A new permissive API key mode never rejects requests; valid keys add claims while missing or invalid keys pass through.
Pre-routing authorization: Authorization, including CORS, can run in the pre-routing phase.
External auth caching: External auth supports caching, with a configurable cache key and a TTL that can be a duration or a CEL expression.

MCP improvements

Added Okta as a first-class MCP authentication provider.
Added resource subscribe and unsubscribe support, and improved resource multiplexing.
Advertised prompt, resource, and tool list-change capabilities.

CEL enhancements

Added route metadata to the CEL context and gRPC status to the response context.
Added raw JWT bearer token access via jwt.rawToken.
Added URL encode/decode functions, timestamp conversion helpers, and bit operations on bytes.
Added support for CEL expressions in direct responses and retry conditions.

Operations and observability

Added proxy timing measurements, a configuration synchronization metric, and request and connection IDs for troubleshooting.
Improved distributed trace output, including JSON mode, body snapshots, and effective gateway and route policies.

Configuration reference agctl CLI reference

Was this page helpful?