AI Gateway (drop-in proxy)

Govern every model call across your org by changing one base URL. No application code changes.

The AI Gateway is a drop-in, OpenAI-compatible proxy. Point your existing OpenAI or Azure OpenAI client at it and add one header. Every model call your org makes is then scored, policy-checked, spend-metered, and audited, with no per-team code changes. It is the fastest way to put governance in front of model traffic that is already running in production.

Two changes, nothing else:

Set base_url to https://app.axiorank.com/api/proxy/v1.
Add the header X-AxioRank-Key: axr_live_… (a gateway key from Settings → API keys).

Your provider key keeps riding in Authorization exactly as before. The proxy forwards it to the provider for that one request and never stores it.

Spend capture is automatic on every call, on any plan, so pointing a base_url here gives you cost visibility immediately. Inline enforcement (deny, hold, and redact) plus audit logging activate once you turn on model-I/O governance in Settings → Governance.

OpenAI

from openai import OpenAI

client = OpenAI(
    base_url="https://app.axiorank.com/api/proxy/v1",
    api_key="sk-...",  # your OpenAI key, forwarded upstream, never stored
    default_headers={"X-AxioRank-Key": "axr_live_..."},
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize Q3 results"}],
)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.axiorank.com/api/proxy/v1",
  apiKey: process.env.OPENAI_API_KEY, // forwarded upstream, never stored
  defaultHeaders: { "X-AxioRank-Key": process.env.AXIORANK_KEY! },
});

const resp = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize Q3 results" }],
});

The same client also covers the Responses API. Once base_url points at the proxy, client.responses.create(...) is governed exactly like chat completions:

resp = client.responses.create(
    model="gpt-4o",
    instructions="You are a helpful assistant.",
    input="Summarize Q3 results",
)

Azure OpenAI

Use the standard OpenAI client and tell the proxy to route to Azure with a few headers. The proxy builds the deployment-scoped Azure URL and authenticates with the api-key Azure expects.

from openai import OpenAI

client = OpenAI(
    base_url="https://app.axiorank.com/api/proxy/v1",
    api_key="<your-azure-api-key>",
    default_headers={
        "X-AxioRank-Key": "axr_live_...",
        "X-AxioRank-Upstream": "azure",
        "X-AxioRank-Azure-Resource": "my-resource",      # my-resource.openai.azure.com
        "X-AxioRank-Azure-Deployment": "gpt-4o",
        "X-AxioRank-Azure-Api-Version": "2024-10-21",
    },
)

Other OpenAI-compatible endpoints

Set X-AxioRank-Upstream to openrouter, or to any full base URL of an OpenAI-compatible endpoint (vLLM, LiteLLM, Together, Fireworks, and similar). The default is OpenAI.

default_headers={
    "X-AxioRank-Key": "axr_live_...",
    "X-AxioRank-Upstream": "https://api.together.xyz/v1",
}

Amazon Bedrock

Bedrock uses AWS SigV4, which binds a signature to the request host, so it cannot be proxied transparently. Instead, send the Converse request body to the gateway and pass your AWS credentials per request. The gateway re-signs for the Bedrock host and never stores them.

curl https://app.axiorank.com/api/proxy/bedrock/v1/converse \
  -H "content-type: application/json" \
  -H "X-AxioRank-Key: axr_live_..." \
  -H "X-AxioRank-AWS-Region: us-east-1" \
  -H "X-AxioRank-AWS-Access-Key-Id: $AWS_ACCESS_KEY_ID" \
  -H "X-AxioRank-AWS-Secret-Access-Key: $AWS_SECRET_ACCESS_KEY" \
  -H "X-AxioRank-Bedrock-Model: anthropic.claude-3-5-sonnet-20240620-v1:0" \
  -d '{"messages":[{"role":"user","content":[{"text":"Summarize Q3 results"}]}]}'

Add X-AxioRank-AWS-Session-Token for temporary credentials. The Converse operation is non-streaming (ConverseStream, which uses the AWS event-stream format, is planned).

Google Vertex AI

Vertex exposes an OpenAI-compatible endpoint, so it rides the same chat completions proxy. Set the upstream to vertex and pass your project and location; the GCP access token rides in the API key field.

from openai import OpenAI

client = OpenAI(
    base_url="https://app.axiorank.com/api/proxy/v1",
    api_key="<output of: gcloud auth print-access-token>",
    default_headers={
        "X-AxioRank-Key": "axr_live_...",
        "X-AxioRank-Upstream": "vertex",
        "X-AxioRank-GCP-Project": "my-project",
        "X-AxioRank-GCP-Location": "us-central1",
    },
)

resp = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[{"role": "user", "content": "Summarize Q3 results"}],
)

Anthropic

Point the Anthropic SDK's base_url at the proxy and add the governance header. Your Anthropic key keeps riding in x-api-key (the SDK sets it), forwarded upstream and never stored.

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-ant-...",  # forwarded upstream, never stored
    base_url="https://app.axiorank.com/api/proxy/anthropic",
    default_headers={"X-AxioRank-Key": "axr_live_..."},
)

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Summarize Q3 results"}],
)

Drop it into your existing stack

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://app.axiorank.com/api/proxy/v1",
    api_key="sk-...",
    default_headers={"X-AxioRank-Key": "axr_live_..."},
)

import litellm

resp = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "hi"}],
    api_base="https://app.axiorank.com/api/proxy/v1",
    api_key="sk-...",
    extra_headers={"X-AxioRank-Key": "axr_live_..."},
)

What the proxy does to a call

Decision	Result
allow	The provider response is returned unchanged.
deny (prompt)	The model is never called. The proxy returns HTTP 403 with an OpenAI-shaped error.
hold (prompt)	The model is never called. The proxy returns HTTP 409 with an `X-AxioRank-Approval-Id` header to poll.
redact (completion)	The flagged spans in the response content are masked. Every other field is preserved.
deny (completion)	The response content is replaced with a blocked notice and `finish_reason` becomes `content_filter`.

Token usage from the provider response is rolled up into the spend dashboard and counts against workspace budgets.

Streaming

Streaming (stream: true) is supported. When model-I/O governance is off, the provider's server-sent events stream straight through and the proxy reads the final usage chunk to meter spend. When governance is on, the proxy buffers the response so it can apply completion-phase redaction or blocks, then re-emits it as server-sent events. That trades incremental delivery for inline enforcement.

Notes and limits

The provider key is used for one request and is never persisted. A managed credential mode, where the AxioRank key is the only credential, is on the roadmap for Enterprise.
OpenAI Chat Completions, OpenAI Responses, the Anthropic Messages API, Amazon Bedrock (Converse), and Google Vertex (via its OpenAI-compatible endpoint) are supported today. Native Gemini generateContent and Bedrock ConverseStream are planned.
Under model-I/O governance, a streaming Responses request is buffered and re-emitted as typed events (response.created, response.output_text.delta, response.completed), so redaction and blocks still apply.
The proxy sits inline in your request path. Governance failures fail open by default so availability is never gated on the control plane.

AI Gateway (drop-in proxy)

On this page