Rabit — The control plane for AI agent actions

§ 01 / prior art

Five incidents from 2025. Each one is what Rabit is designed to prevent.

These are not hypothetical threat models. Each entry below is a named CVE or disclosed incident from a production AI product in the last ten months. Next to each, the Rabit mechanism that closes it.

2025.06

CVE-2025-32711

EchoLeak — Microsoft 365 Copilot

A zero-click prompt-injection class where an attacker-controlled email causes Copilot to exfiltrate tenant data back through its own response content.

Closed by

Output re-governance — a second-pass injection scan on the tool's returned output catches the exfiltration payload before it re-enters the agent context.

2025.07

CVE-2025-8217

Amazon Q wiper

An injected system prompt instructed the assistant to wipe the developer's filesystem and AWS resources. Shipped to ~1M installs in v1.84.0; only a syntax error in the payload prevented execution.

Closed by

Policy engine destructive-verb block — L1 regex denies any call matching *.delete_* or rm -rf class patterns before dispatch.

2025.08

CVE-2025-54136

MCPoison — Cursor

An MCP server's advertised tool schema could be mutated between approval and invocation, letting a benign-looking tool later execute something the user never consented to.

Closed by

MCP adapter re-validates the tool schema and capability hash on every call, not just at server registration time.

2025.08

CVE-2025-54135

CurXecute — Cursor

A crafted MCP response triggered code execution inside the editor's agent loop via the tool-output path.

Closed by

Output re-governance and per-tool egress allowlist — the injected content is flagged before re-entering context; the follow-on call cannot reach its C2 endpoint.

2025.09

Supply chain · no CVE

Postmark-MCP backdoor

A compromised MCP package silently BCC'd all outbound email to an attacker-controlled address. No prompt injection — just a malicious dependency.

Closed by

Per-tool egress allowlist — the BCC destination is not on the allowlist; the SMTP call fails at the socket level and the exfiltration never leaves the host.

§ 02 / properties

Four contracts. Enforced in production, not documented in a wiki.

Rabit is implemented as four property-based tests that fail CI if any one breaks. Three of them in motion below; the fourth — tenant isolation — is enforced silently at every database query.

Pane A · policy

Policy-violating actions, blocked before execution. contract p1

L1 regex L2 LLM-judge L3 drift

$

Pane B · egress

Per-tool egress, deny by default. NIST SP 800-53 SC-7(5)

$

Pane C · audit

Every action in a hash chain anchored to an RFC 3161 timestamp. contract p4

$ rabit-verify --honest

§ 03 / pipeline

Every agent action passes through five stages. Each one can stop it.

01 / intercept

The agent's intended tool call is captured at the adapter layer before any side effect. The call is serialized, identity-bound (per-agent mTLS), and enqueued to GOVERN. Nothing has left the control plane yet.

02 / govern

Three layers evaluate the call. L1 regex over structural invariants. L2 Groq-hosted llama-3.1-8b-instant judge against your policy corpus. L3 MiniLM semantic-drift check. Any layer can deny.

03 / execute

On allow, the call is dispatched via a typed adapter. Each adapter carries a per-tool egress allowlist; connections to anywhere not on the list fail at the socket level.

04 / output re-gov

The tool's return value is scanned for injected content before it reaches the agent's context window. This is the stage that closes EchoLeak and the CurXecute class.

05 / audit

Verdict, payload hash, and evidence append to a tamper-evident chain. previous_hash is SHA-256 of the prior record; daily leaves hash to a Merkle root timestamped by FreeTSA per RFC 3161.

§ 04 / standards

Maps to the standards your auditor already has in their checklist.

OWASP LLM TOP 10 · 2025

LLM01 · prompt injection

Closed by GOVERN L2 semantic judge + output re-governance.

LLM06 · excessive agency

Closed by policy corpus and approval gates on risk-tier sensitive / high_risk.

LLM07 · system prompt leakage

Covered in the 200-case eval suite with xfail-tracked regressions.

NIST SP 800-53 · Rev 5

SC-7(5) · deny by default

Implemented by the per-tool egress allowlist; failures at the socket level.

AU-10 · non-repudiation

Implemented by the SHA-256 hash chain + RFC 3161 timestamp.

AU-12 · audit generation

Every allow/deny verdict is appended; SIEM webhook fires on the same event.

MITRE ATLAS · v5.4.0

AML.T0051 · LLM prompt injection

Evaluated in the adversarial safety suite.

AML.T0053 · LLM plugin compromise

Evaluated in the MCP adapter test matrix.

AML.T0040 · privilege escalation

Evaluated via destructive-verb and IAM-scope cases.

EU AI ACT · ART 12 + 14

Art 12 · logging

Compliance bundle exports a signed ZIP with Merkle anchors.

Art 14 · human oversight

Approval gates + direct-to-operator escalation on high-risk.

Offline verification

Regulator runs rabit-verify with no Rabit servers in the loop.

200 adversarial cases·188 pass·12 xfail·zero warnings

§ 05 / positioning

Rabit is not a prompt-injection filter. Not an LLM gateway. Not an observability tool.

Rabit is not a guardrail classifier.

Guardrail classifiers score prompts. Rabit evaluates actions. A classifier that scores the user's message benign will still let the agent's resulting aws.iam.put_user_policy call through. Rabit doesn't care about the prompt; it cares about the tool call.

Rabit is not an LLM gateway.

LLM gateways proxy inference requests and enforce rate limits and PII redaction on the text passing to and from the model. Rabit sits one layer down: between the agent's decision to act and the action itself. You can run Rabit behind an LLM gateway; the two don't overlap.

Rabit is not an observability tool.

Observability tells you, with good latency, that an agent did something unexpected yesterday. Rabit tells the agent, at sub-second latency, that it cannot do the thing now. Observability is post-hoc; Rabit is pre-commit.

Rabit is not a GRC platform.

Vanta and Drata manage the evidence that your controls exist. Rabit is a control. Its compliance bundle is designed to be imported into your GRC workflow, not to replace it.

§ 06 / alternatives

Three ways to put an AI agent in production. Two of them take a quarter.

Build it yourself.

~90 days · ~$400k loaded

You write the control plane. Everything below is on your roadmap instead of your product.

You inherit

The policy engine
The hash-chain audit store
Merkle timestamping against a TSA
The MCP adapter with egress allowlists
The 200-case adversarial eval suite
Maintaining all of the above — while shipping your product

Observe and hope.

$0 upfront · uncapped incident cost

You instrument the agent, ship it, and read the dashboards the morning after.

You inherit

A dashboard that tells you about EchoLeak the next morning
A post-incident Slack channel
An auditor asking for controls you don't have
A compliance bundle you can't produce

Rabit.

Design-partner · deploys in your VPC in under a day

Four contracts enforced in production. Offline-verifiable audit. A compliance bundle your auditor already knows how to read.

You inherit

Four property-based contracts in production
A standalone verifier CLI your regulator runs offline
Compliance bundle mapped to EU AI Act Art 12/14
The 200-case eval suite against OWASP / NIST / MITRE
Direct Slack with the founder

§ 07 / proof

Read the source. Run the verifier. Check the tests.

Fig 02 / architecture as deployed in staging

Rabit policy · 14 lines of YAMLpolicy.yaml

policy: staging-agents-v3
identity:
  require: mtls
allow:
  - tool: github.*
    when: agent in ["pr-review-bot"]
  - tool: slack.post_message
    when: channel in ["#agent-notify"]
deny:
  - tool: aws.iam.*
  - tool: "*.delete_*"
  - network: "*.internal.corp"
audit: all
approval_required:
  - risk_tier: [sensitive, high_risk]

rabit-verify · offline run435 lines of python

$ rabit-verify ./bundle-2026-04-15.zip
[OK]  bundle signature verified (Ed25519)
[OK]  1,247 entries — hash chain intact
[OK]  merkle root matches leaf set (sha256:7f3a…b21c)
[OK]  RFC 3161 timestamp valid — 2026-04-15T23:59:59Z
[OK]  OWASP LLM01, LLM06, LLM07 evidence present
[OK]  NIST SP 800-53 SC-7(5) evidence present
[OK]  EU AI Act Art 12 log completeness verified
verification complete — 0 errors
trust anchor: offline

#1243

7f3a…b21c

prev 0000…root

#1244

a92e…0f44

prev 7f3a…b21c

#1245

c81d…9a17

prev a92e…0f44

#1246

5b20…ee38

prev c81d…9a17

#1247

0e77…41cb

prev 5b20…ee38

Tap or click a block to simulate tamper. Rabit-verify is 435 lines of Python — your regulator runs it without trusting our servers.

See the rabit-verify source →

§ 08 / threat model

If you don't see your threat here, we'd like to know.

Rabit's policy boundary is built around four canonical taxonomies. Below, the threat catalogue Rabit is designed to close — mapped paragraph-by-paragraph to OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0, NIST AI 100-2e2025, and our own STRIDE adaptation for AI agent action surfaces.

A · extended catalogue

Five from 2025 already in §01. Five more below — and the catalogue grows weekly.

2024.08

No CVE · PromptArmor

Slack AI prompt-injection exfil

Crafted public channel message coerced Slack AI to leak data from private channels via Markdown link rendering.

Closed by

Output re-governance flags imperative instructions in tool returns; per-tool egress allowlist denies the rendered link's destination.

2025.05

No CVE · Legit Security

GitLab Duo prompt injection

Hidden text in a merge request description coerced Duo into executing attacker-supplied prompts when a developer asked Duo to summarise.

Closed by

Output re-governance scans tool returns for instruction-shaped content before they reach the agent context window.

2025.07

Public incident · no CVE

Replit DB deletion

An autonomous code agent executed a destructive database command without an approval gate, wiping a production dataset.

Closed by

L1 destructive-verb regex blocks any *.delete_*, drop_*, truncate_* patterns; risk-tier high_risk requires explicit approval.

2026.01

CVE-2026-21520

Copilot Studio ShareLeak indirect injection

Capsule Security found Copilot Studio agents triggered by SharePoint forms concatenated untrusted form input directly into the agent's prompt, allowing a fake system-role injection that exfiltrated SharePoint List data via Outlook.

Closed by

Output re-governance scans tool returns for instruction-shaped content; per-tool egress allowlist denies the Outlook destination when not on the approved sender list.

2026.03

Research · Gray Swan IPI Arena Q1 2026

Indirect prompt injection ASR across 13 frontier models

Gray Swan IPI Arena (with UK AISI / US CAISI as judges; Anthropic, OpenAI, Google DeepMind, Meta, Amazon as sponsors) tested 13 frontier models with 272K attack attempts across 41 agentic scenarios. ASR ranged from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). Tool-use was the most vulnerable setting at 4.82%.

Closed by

Defense-in-depth pipeline: even if L2 LLM-judge is bypassed, L3 drift, output re-governance, and per-tool egress provide three further chokepoints.

B · STRIDE for AI agent action surfaces

STRIDE category

Threat against an AI agent

Rabit mitigation

S Spoofing

An adversary impersonates an agent's identity to issue tool calls in its name.

Per-agent mTLS at the intercept layer; identity is bound to the call envelope before GOVERN evaluates.

T Tampering

The audit log is modified after the fact to hide an unauthorised action.

SHA-256 hash chain linking every record; daily Merkle root anchored to RFC 3161 timestamp. Any tampering breaks chain integrity, detectable offline by rabit-verify.

R Repudiation

An operator denies that an agent under their control performed a high-risk action.

Every approval is recorded with the operator's identity, timestamp, and the specific risk-tier escalation that triggered it. AU-10 non-repudiation.

I Information Disclosure

An agent exfiltrates sensitive data to an attacker-controlled destination via a legitimate tool capability.

Per-tool egress allowlist (NIST SP 800-53 SC-7(5)) denies sockets to non-approved hosts at the connect() layer, before any payload is sent.

D Denial of Service

An adversary exhausts the agent's quota or recursion depth to deny service to legitimate users.

Per-agent rate budgets at the GOVERN layer; recursion-depth caps; all outside the trust boundary of the agent itself.

E Elevation of Privilege

An agent originally scoped to read-only escalates to write or admin via a chained tool sequence.

Risk-tier transitions trigger approval gates; tool-graph reachability is enforced ahead of dispatch; IAM-scope expansion is denied at L1.

C · standards alignment

OWASP LLM TOP 10 · 2025

LLM01

prompt injection

LLM02

sensitive disclosure

LLM03

supply chain

LLM05

improper output

LLM06

excessive agency

LLM07

system prompt leak

MITRE ATLAS · v5.4.0

AML.T0051

prompt injection

AML.T0052

retrieval crafting

AML.T0053

plugin compromise

AML.T0040

privilege escalation

AML.T0054

jailbreak

AML.T0056

meta prompt

NIST AI 100-2e2025

§3.1

direct prompt injection

§3.2

indirect prompt injection

§3.4

model evasion

§3.7

supply chain

§3.8 [*]

data poisoning

—

EU AI ACT · ART 12 + 14

Art 12

logging

Art 14

human oversight

—

[*] Data poisoning is upstream of Rabit's trust boundary — Rabit assumes the model itself may be adversarial and validates every action regardless. Detection of poisoning is in scope for v2.

If you find a threat Rabit doesn't yet close, email adam.shibli2001@gmail.com with subject "RABIT-DISCLOSURE". We'll respond within 48 hours, credit you on the changelog (when it exists), and add the threat to the suite. Rabit's adversarial eval suite has 200 cases today; we add new cases every time a threat is reported or published.

§ 09 / fit

Two kinds of teams should be on this page. If you're not one of them, come back in six months.

Persona A

Security / platform engineer.

You run the infrastructure AI agents run on. You've already had one "oh shit" moment this quarter. You know what SC-7(5) is without looking it up. You will click view-source on the verifier CLI. You want the compliance bundle before you want the feature.

Persona B

AI platform lead.

You're the person rolling agents into production. Security and legal are blocking you. You need a single artifact you can hand to both teams that answers their actual questions — policy, audit, egress, evidence.

§ 10 / intake

We're taking five design partners in Q2 2026.

Design partners get direct Slack with the founder, a 24-hour response on every incident, deployment in your VPC within one business day, and a seat at the roadmap table. In exchange, we ask for one 30-minute call a week and the right to cite you once you're comfortable. We'll only take the meeting if you're running agents in production today.

Design-partner intake — 30 min with Adam Shibli Cal.com · 30 min

April 2026→

MTWTFSS

Available slotsApr 28

Pick a time slot above so we can pre-fill it for you, then fill out the form.

Prefer email? adam.shibli2001@gmail.com

The control plane for AI agent actions.

Five incidents from 2025. Each one is what Rabit is designed to prevent.

Four contracts. Enforced in production, not documented in a wiki.

Policy-violating actions, blocked before execution. contract p1

Per-tool egress, deny by default. NIST SP 800-53 SC-7(5)

Every action in a hash chain anchored to an RFC 3161 timestamp. contract p4

Every agent action passes through five stages. Each one can stop it.

Maps to the standards your auditor already has in their checklist.

Rabit is not a prompt-injection filter. Not an LLM gateway. Not an observability tool.

Rabit is not a guardrail classifier.

Rabit is not an LLM gateway.

Rabit is not an observability tool.

Rabit is not a GRC platform.

Three ways to put an AI agent in production. Two of them take a quarter.

Read the source. Run the verifier. Check the tests.

If you don't see your threat here, we'd like to know.

Two kinds of teams should be on this page. If you're not one of them, come back in six months.

Security / platform engineer.

AI platform lead.

We're taking five design partners in Q2 2026.