AI Gateway (drop-in proxy)
Govern every model call across your org by changing one base URL. No application code changes.
The AI Gateway is a drop-in, OpenAI-compatible proxy. Point your existing OpenAI or Azure OpenAI client at it and add one header. Every model call your org makes is then scored, policy-checked, spend-metered, and audited, with no per-team code changes. It is the fastest way to put governance in front of model traffic that is already running in production.
Two changes, nothing else:
- Set
base_urltohttps://app.axiorank.com/api/proxy/v1. - Add the header
X-AxioRank-Key: axr_live_…(a gateway key from Settings → API keys).
Your provider key keeps riding in Authorization exactly as before. The proxy
forwards it to the provider for that one request and never stores it.
Spend capture is automatic on every call, on any plan, so pointing a base_url
here gives you cost visibility immediately. Inline enforcement (deny, hold, and
redact) plus audit logging activate once you turn on model-I/O governance in
Settings → Governance.
OpenAI
from openai import OpenAI
client = OpenAI(
base_url="https://app.axiorank.com/api/proxy/v1",
api_key="sk-...", # your OpenAI key, forwarded upstream, never stored
default_headers={"X-AxioRank-Key": "axr_live_..."},
)
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize Q3 results"}],
)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://app.axiorank.com/api/proxy/v1",
apiKey: process.env.OPENAI_API_KEY, // forwarded upstream, never stored
defaultHeaders: { "X-AxioRank-Key": process.env.AXIORANK_KEY! },
});
const resp = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize Q3 results" }],
});The same client also covers the Responses API. Once base_url points at the
proxy, client.responses.create(...) is governed exactly like chat completions:
resp = client.responses.create(
model="gpt-4o",
instructions="You are a helpful assistant.",
input="Summarize Q3 results",
)Azure OpenAI
Use the standard OpenAI client and tell the proxy to route to Azure with a few
headers. The proxy builds the deployment-scoped Azure URL and authenticates with
the api-key Azure expects.
from openai import OpenAI
client = OpenAI(
base_url="https://app.axiorank.com/api/proxy/v1",
api_key="<your-azure-api-key>",
default_headers={
"X-AxioRank-Key": "axr_live_...",
"X-AxioRank-Upstream": "azure",
"X-AxioRank-Azure-Resource": "my-resource", # my-resource.openai.azure.com
"X-AxioRank-Azure-Deployment": "gpt-4o",
"X-AxioRank-Azure-Api-Version": "2024-10-21",
},
)Other OpenAI-compatible endpoints
Set X-AxioRank-Upstream to openrouter, or to any full base URL of an
OpenAI-compatible endpoint (vLLM, LiteLLM, Together, Fireworks, and similar). The
default is OpenAI.
default_headers={
"X-AxioRank-Key": "axr_live_...",
"X-AxioRank-Upstream": "https://api.together.xyz/v1",
}Amazon Bedrock
Bedrock uses AWS SigV4, which binds a signature to the request host, so it cannot be proxied transparently. Instead, send the Converse request body to the gateway and pass your AWS credentials per request. The gateway re-signs for the Bedrock host and never stores them.
curl https://app.axiorank.com/api/proxy/bedrock/v1/converse \
-H "content-type: application/json" \
-H "X-AxioRank-Key: axr_live_..." \
-H "X-AxioRank-AWS-Region: us-east-1" \
-H "X-AxioRank-AWS-Access-Key-Id: $AWS_ACCESS_KEY_ID" \
-H "X-AxioRank-AWS-Secret-Access-Key: $AWS_SECRET_ACCESS_KEY" \
-H "X-AxioRank-Bedrock-Model: anthropic.claude-3-5-sonnet-20240620-v1:0" \
-d '{"messages":[{"role":"user","content":[{"text":"Summarize Q3 results"}]}]}'Add X-AxioRank-AWS-Session-Token for temporary credentials. The Converse
operation is non-streaming (ConverseStream, which uses the AWS event-stream
format, is planned).
Google Vertex AI
Vertex exposes an OpenAI-compatible endpoint, so it rides the same chat
completions proxy. Set the upstream to vertex and pass your project and
location; the GCP access token rides in the API key field.
from openai import OpenAI
client = OpenAI(
base_url="https://app.axiorank.com/api/proxy/v1",
api_key="<output of: gcloud auth print-access-token>",
default_headers={
"X-AxioRank-Key": "axr_live_...",
"X-AxioRank-Upstream": "vertex",
"X-AxioRank-GCP-Project": "my-project",
"X-AxioRank-GCP-Location": "us-central1",
},
)
resp = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[{"role": "user", "content": "Summarize Q3 results"}],
)Anthropic
Point the Anthropic SDK's base_url at the proxy and add the governance header.
Your Anthropic key keeps riding in x-api-key (the SDK sets it), forwarded
upstream and never stored.
from anthropic import Anthropic
client = Anthropic(
api_key="sk-ant-...", # forwarded upstream, never stored
base_url="https://app.axiorank.com/api/proxy/anthropic",
default_headers={"X-AxioRank-Key": "axr_live_..."},
)
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Summarize Q3 results"}],
)Drop it into your existing stack
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
base_url="https://app.axiorank.com/api/proxy/v1",
api_key="sk-...",
default_headers={"X-AxioRank-Key": "axr_live_..."},
)import litellm
resp = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "hi"}],
api_base="https://app.axiorank.com/api/proxy/v1",
api_key="sk-...",
extra_headers={"X-AxioRank-Key": "axr_live_..."},
)What the proxy does to a call
| Decision | Result |
|---|---|
| allow | The provider response is returned unchanged. |
| deny (prompt) | The model is never called. The proxy returns HTTP 403 with an OpenAI-shaped error. |
| hold (prompt) | The model is never called. The proxy returns HTTP 409 with an X-AxioRank-Approval-Id header to poll. |
| redact (completion) | The flagged spans in the response content are masked. Every other field is preserved. |
| deny (completion) | The response content is replaced with a blocked notice and finish_reason becomes content_filter. |
Token usage from the provider response is rolled up into the spend dashboard and counts against workspace budgets.
Streaming
Streaming (stream: true) is supported. When model-I/O governance is off, the
provider's server-sent events stream straight through and the proxy reads the
final usage chunk to meter spend. When governance is on, the proxy buffers the
response so it can apply completion-phase redaction or blocks, then re-emits it as
server-sent events. That trades incremental delivery for inline enforcement.
Notes and limits
- The provider key is used for one request and is never persisted. A managed credential mode, where the AxioRank key is the only credential, is on the roadmap for Enterprise.
- OpenAI Chat Completions, OpenAI Responses, the Anthropic Messages API, Amazon
Bedrock (Converse), and Google Vertex (via its OpenAI-compatible endpoint) are
supported today. Native Gemini
generateContentand Bedrock ConverseStream are planned. - Under model-I/O governance, a streaming Responses request is buffered and
re-emitted as typed events (
response.created,response.output_text.delta,response.completed), so redaction and blocks still apply. - The proxy sits inline in your request path. Governance failures fail open by default so availability is never gated on the control plane.