Red-team testing
Attack your agents before someone else does.
Run a corpus of real agent attacks against your live detectors and policies, not a sandbox. See exactly what your posture catches and what slips through, then apply the suggested fix for every miss in one click.
real corpus · scored against your posture · one-click remediation
What's in the corpus
Single calls, multi-step chains, and the tricks attackers actually use.
The corpus is versioned, so a posture score stays interpretable as it grows. Each scenario carries the protective outcome a sound posture must produce, and the benign controls keep an over-eager posture honest.
Single-call attacks
Live secrets in arguments, destructive commands, PII egress, and prompt injection, each with the outcome it must trigger.
Multi-step kill chains
Stateful sequences where one step reads untrusted data and a later step exfiltrates it, caught by value-level provenance.
Evasion techniques
Homoglyphs, zero-width characters, and nested encodings that normalized detection must see through.
Benign controls
Legitimate calls that must not be blocked, so a high catch rate cannot hide a wall of false positives.
Run it
Pick a family and watch your posture handle it.
Choose an attack family and run it. Each scenario shows the protective outcome the default posture produces. On your own workspace, the runner scores these against your actual rules.
0 benign controls allowed, no false positive.
corpus 2026.06.13. On your workspace the runner scores these against your live rules and offers a one-click fix for any miss.
Every miss has a fix
A failure is a suggestion, not just a red mark.
When a scenario is missed, the report carries a concrete remediation in the same shape as a policy rule, so closing the gap is a click, not a research project.
{
"tool_pattern": "*",
"action": "deny",
"signalCategory": "secret",
"rationale": "Deny any tool call whose arguments contain a live secret."
}Free gets a teaser
Shift it left
Fail the build when your posture regresses.
Run the corpus in CI against the policies you are about to ship. If a change quietly weakens a rule, the gate catches it before it reaches production.
# score the workspace posture against the corpus npx @axiorank/cli red-team run --fail-under 100
Keep exploring
Continue across the control plane.
Know what your posture actually catches.
Run the corpus against your live rules, fix every miss, and keep the score from slipping with a gate in CI.