Rabit · v0 · design-partner intake open

The control plane for AI agent actions.

Intercept, policy-check, and cryptographically audit every action your agents take — with proof every regulator can verify offline.

429 tests passing· OWASP LLM Top 10 2025· NIST SP 800-53 SC-7(5)· EU AI Act Art 12/14· RFC 3161
§ 01 / prior art

Five incidents from 2025. Each one is what Rabit is designed to prevent.

These are not hypothetical threat models. Each entry below is a named CVE or disclosed incident from a production AI product in the last ten months. Next to each, the Rabit mechanism that closes it.

2025.06
CVE-2025-32711
EchoLeak — Microsoft 365 Copilot
A zero-click prompt-injection class where an attacker-controlled email causes Copilot to exfiltrate tenant data back through its own response content.
Closed by
Output re-governance — a second-pass injection scan on the tool's returned output catches the exfiltration payload before it re-enters the agent context.
2025.07
CVE-2025-8217
Amazon Q wiper
An injected system prompt instructed the assistant to wipe the developer's filesystem and AWS resources. Shipped to ~1M installs in v1.84.0; only a syntax error in the payload prevented execution.
Closed by
Policy engine destructive-verb block — L1 regex denies any call matching *.delete_* or rm -rf class patterns before dispatch.
2025.08
CVE-2025-54136
MCPoison — Cursor
An MCP server's advertised tool schema could be mutated between approval and invocation, letting a benign-looking tool later execute something the user never consented to.
Closed by
MCP adapter re-validates the tool schema and capability hash on every call, not just at server registration time.
2025.08
CVE-2025-54135
CurXecute — Cursor
A crafted MCP response triggered code execution inside the editor's agent loop via the tool-output path.
Closed by
Output re-governance and per-tool egress allowlist — the injected content is flagged before re-entering context; the follow-on call cannot reach its C2 endpoint.
2025.09
Supply chain · no CVE
Postmark-MCP backdoor
A compromised MCP package silently BCC'd all outbound email to an attacker-controlled address. No prompt injection — just a malicious dependency.
Closed by
Per-tool egress allowlist — the BCC destination is not on the allowlist; the SMTP call fails at the socket level and the exfiltration never leaves the host.
§ 02 / properties

Four contracts. Enforced in production, not documented in a wiki.

Rabit is implemented as four property-based tests that fail CI if any one breaks. Three of them in motion below; the fourth — tenant isolation — is enforced silently at every database query.

Pane A · policy

Policy-violating actions, blocked before execution. contract p1

L1 regex L2 LLM-judge L3 drift
$
Pane B · egress

Per-tool egress, deny by default. NIST SP 800-53 SC-7(5)

$
Pane C · audit

Every action in a hash chain anchored to an RFC 3161 timestamp. contract p4

$ rabit-verify --honest
§ 03 / pipeline

Every agent action passes through five stages. Each one can stop it.

01 / intercept

The agent's intended tool call is captured at the adapter layer before any side effect. The call is serialized, identity-bound (per-agent mTLS), and enqueued to GOVERN. Nothing has left the control plane yet.

02 / govern

Three layers evaluate the call. L1 regex over structural invariants. L2 Groq-hosted llama-3.1-8b-instant judge against your policy corpus. L3 MiniLM semantic-drift check. Any layer can deny.

03 / execute

On allow, the call is dispatched via a typed adapter. Each adapter carries a per-tool egress allowlist; connections to anywhere not on the list fail at the socket level.

04 / output re-gov

The tool's return value is scanned for injected content before it reaches the agent's context window. This is the stage that closes EchoLeak and the CurXecute class.

05 / audit

Verdict, payload hash, and evidence append to a tamper-evident chain. previous_hash is SHA-256 of the prior record; daily leaves hash to a Merkle root timestamped by FreeTSA per RFC 3161.

§ 04 / standards

Maps to the standards your auditor already has in their checklist.

OWASP LLM TOP 10 · 2025
LLM01 · prompt injection
Closed by GOVERN L2 semantic judge + output re-governance.
LLM06 · excessive agency
Closed by policy corpus and approval gates on risk-tier sensitive / high_risk.
LLM07 · system prompt leakage
Covered in the 200-case eval suite with xfail-tracked regressions.
NIST SP 800-53 · Rev 5
SC-7(5) · deny by default
Implemented by the per-tool egress allowlist; failures at the socket level.
AU-10 · non-repudiation
Implemented by the SHA-256 hash chain + RFC 3161 timestamp.
AU-12 · audit generation
Every allow/deny verdict is appended; SIEM webhook fires on the same event.
MITRE ATLAS · v5.4.0
AML.T0051 · LLM prompt injection
Evaluated in the adversarial safety suite.
AML.T0053 · LLM plugin compromise
Evaluated in the MCP adapter test matrix.
AML.T0040 · privilege escalation
Evaluated via destructive-verb and IAM-scope cases.
EU AI ACT · ART 12 + 14
Art 12 · logging
Compliance bundle exports a signed ZIP with Merkle anchors.
Art 14 · human oversight
Approval gates + direct-to-operator escalation on high-risk.
Offline verification
Regulator runs rabit-verify with no Rabit servers in the loop.
200 adversarial cases·188 pass·12 xfail·zero warnings
§ 05 / positioning

Rabit is not a prompt-injection filter. Not an LLM gateway. Not an observability tool.

Rabit is not a guardrail classifier.

Guardrail classifiers score prompts. Rabit evaluates actions. A classifier that scores the user's message benign will still let the agent's resulting aws.iam.put_user_policy call through. Rabit doesn't care about the prompt; it cares about the tool call.

Rabit is not an LLM gateway.

LLM gateways proxy inference requests and enforce rate limits and PII redaction on the text passing to and from the model. Rabit sits one layer down: between the agent's decision to act and the action itself. You can run Rabit behind an LLM gateway; the two don't overlap.

Rabit is not an observability tool.

Observability tells you, with good latency, that an agent did something unexpected yesterday. Rabit tells the agent, at sub-second latency, that it cannot do the thing now. Observability is post-hoc; Rabit is pre-commit.

Rabit is not a GRC platform.

Vanta and Drata manage the evidence that your controls exist. Rabit is a control. Its compliance bundle is designed to be imported into your GRC workflow, not to replace it.

§ 06 / alternatives

Three ways to put an AI agent in production. Two of them take a quarter.

Build it yourself.
~90 days · ~$400k loaded

You write the control plane. Everything below is on your roadmap instead of your product.

You inherit
  • The policy engine
  • The hash-chain audit store
  • Merkle timestamping against a TSA
  • The MCP adapter with egress allowlists
  • The 200-case adversarial eval suite
  • Maintaining all of the above — while shipping your product
Observe and hope.
$0 upfront · uncapped incident cost

You instrument the agent, ship it, and read the dashboards the morning after.

You inherit
  • A dashboard that tells you about EchoLeak the next morning
  • A post-incident Slack channel
  • An auditor asking for controls you don't have
  • A compliance bundle you can't produce
Rabit.
Design-partner · deploys in your VPC in under a day

Four contracts enforced in production. Offline-verifiable audit. A compliance bundle your auditor already knows how to read.

You inherit
  • Four property-based contracts in production
  • A standalone verifier CLI your regulator runs offline
  • Compliance bundle mapped to EU AI Act Art 12/14
  • The 200-case eval suite against OWASP / NIST / MITRE
  • Direct Slack with the founder
§ 07 / proof

Read the source. Run the verifier. Check the tests.

Fig 02 / architecture as deployed in staging
Rabit policy · 14 lines of YAMLpolicy.yaml
policy: staging-agents-v3
identity:
  require: mtls
allow:
  - tool: github.*
    when: agent in ["pr-review-bot"]
  - tool: slack.post_message
    when: channel in ["#agent-notify"]
deny:
  - tool: aws.iam.*
  - tool: "*.delete_*"
  - network: "*.internal.corp"
audit: all
approval_required:
  - risk_tier: [sensitive, high_risk]
rabit-verify · offline run435 lines of python
$ rabit-verify ./bundle-2026-04-15.zip
[OK]  bundle signature verified (Ed25519)
[OK]  1,247 entries — hash chain intact
[OK]  merkle root matches leaf set (sha256:7f3a…b21c)
[OK]  RFC 3161 timestamp valid — 2026-04-15T23:59:59Z
[OK]  OWASP LLM01, LLM06, LLM07 evidence present
[OK]  NIST SP 800-53 SC-7(5) evidence present
[OK]  EU AI Act Art 12 log completeness verified
verification complete — 0 errors
trust anchor: offline
#1243
7f3a…b21c
#1244
a92e…0f44
#1245
c81d…9a17
#1246
5b20…ee38
#1247
0e77…41cb
Tap or click a block to simulate tamper. Rabit-verify is 435 lines of Python — your regulator runs it without trusting our servers.
See the rabit-verify source
§ 08 / threat model

If you don't see your threat here, we'd like to know.

Rabit's policy boundary is built around four canonical taxonomies. Below, the threat catalogue Rabit is designed to close — mapped paragraph-by-paragraph to OWASP LLM Top 10 2025, MITRE ATLAS v5.4.0, NIST AI 100-2e2025, and our own STRIDE adaptation for AI agent action surfaces.

A · extended catalogue

Five from 2025 already in §01. Five more below — and the catalogue grows weekly.

2024.08
No CVE · PromptArmor
Slack AI prompt-injection exfil
Crafted public channel message coerced Slack AI to leak data from private channels via Markdown link rendering.
Closed by
Output re-governance flags imperative instructions in tool returns; per-tool egress allowlist denies the rendered link's destination.
2025.05
No CVE · Legit Security
GitLab Duo prompt injection
Hidden text in a merge request description coerced Duo into executing attacker-supplied prompts when a developer asked Duo to summarise.
Closed by
Output re-governance scans tool returns for instruction-shaped content before they reach the agent context window.
2025.07
Public incident · no CVE
Replit DB deletion
An autonomous code agent executed a destructive database command without an approval gate, wiping a production dataset.
Closed by
L1 destructive-verb regex blocks any *.delete_*, drop_*, truncate_* patterns; risk-tier high_risk requires explicit approval.
2026.01
CVE-2026-21520
Copilot Studio ShareLeak indirect injection
Capsule Security found Copilot Studio agents triggered by SharePoint forms concatenated untrusted form input directly into the agent's prompt, allowing a fake system-role injection that exfiltrated SharePoint List data via Outlook.
Closed by
Output re-governance scans tool returns for instruction-shaped content; per-tool egress allowlist denies the Outlook destination when not on the approved sender list.
2026.03
Research · Gray Swan IPI Arena Q1 2026
Indirect prompt injection ASR across 13 frontier models
Gray Swan IPI Arena (with UK AISI / US CAISI as judges; Anthropic, OpenAI, Google DeepMind, Meta, Amazon as sponsors) tested 13 frontier models with 272K attack attempts across 41 agentic scenarios. ASR ranged from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). Tool-use was the most vulnerable setting at 4.82%.
Closed by
Defense-in-depth pipeline: even if L2 LLM-judge is bypassed, L3 drift, output re-governance, and per-tool egress provide three further chokepoints.
B · STRIDE for AI agent action surfaces
STRIDE category
Threat against an AI agent
Rabit mitigation
S Spoofing
An adversary impersonates an agent's identity to issue tool calls in its name.
Per-agent mTLS at the intercept layer; identity is bound to the call envelope before GOVERN evaluates.
T Tampering
The audit log is modified after the fact to hide an unauthorised action.
SHA-256 hash chain linking every record; daily Merkle root anchored to RFC 3161 timestamp. Any tampering breaks chain integrity, detectable offline by rabit-verify.
R Repudiation
An operator denies that an agent under their control performed a high-risk action.
Every approval is recorded with the operator's identity, timestamp, and the specific risk-tier escalation that triggered it. AU-10 non-repudiation.
I Information Disclosure
An agent exfiltrates sensitive data to an attacker-controlled destination via a legitimate tool capability.
Per-tool egress allowlist (NIST SP 800-53 SC-7(5)) denies sockets to non-approved hosts at the connect() layer, before any payload is sent.
D Denial of Service
An adversary exhausts the agent's quota or recursion depth to deny service to legitimate users.
Per-agent rate budgets at the GOVERN layer; recursion-depth caps; all outside the trust boundary of the agent itself.
E Elevation of Privilege
An agent originally scoped to read-only escalates to write or admin via a chained tool sequence.
Risk-tier transitions trigger approval gates; tool-graph reachability is enforced ahead of dispatch; IAM-scope expansion is denied at L1.
C · standards alignment
OWASP LLM TOP 10 · 2025
LLM01
prompt injection
LLM02
sensitive disclosure
LLM03
supply chain
LLM05
improper output
LLM06
excessive agency
LLM07
system prompt leak
MITRE ATLAS · v5.4.0
AML.T0051
prompt injection
AML.T0052
retrieval crafting
AML.T0053
plugin compromise
AML.T0040
privilege escalation
AML.T0054
jailbreak
AML.T0056
meta prompt
NIST AI 100-2e2025
§3.1
direct prompt injection
§3.2
indirect prompt injection
§3.4
model evasion
§3.7
supply chain
§3.8 [*]
data poisoning
 
EU AI ACT · ART 12 + 14
Art 12
logging
Art 14
human oversight
 
 
 
 

[*] Data poisoning is upstream of Rabit's trust boundary — Rabit assumes the model itself may be adversarial and validates every action regardless. Detection of poisoning is in scope for v2.

If you find a threat Rabit doesn't yet close, email adam.shibli2001@gmail.com with subject "RABIT-DISCLOSURE". We'll respond within 48 hours, credit you on the changelog (when it exists), and add the threat to the suite. Rabit's adversarial eval suite has 200 cases today; we add new cases every time a threat is reported or published.

§ 09 / fit

Two kinds of teams should be on this page. If you're not one of them, come back in six months.

Persona A

Security / platform engineer.

You run the infrastructure AI agents run on. You've already had one "oh shit" moment this quarter. You know what SC-7(5) is without looking it up. You will click view-source on the verifier CLI. You want the compliance bundle before you want the feature.

Persona B

AI platform lead.

You're the person rolling agents into production. Security and legal are blocking you. You need a single artifact you can hand to both teams that answers their actual questions — policy, audit, egress, evidence.

§ 10 / intake

We're taking five design partners in Q2 2026.

Design partners get direct Slack with the founder, a 24-hour response on every incident, deployment in your VPC within one business day, and a seat at the roadmap table. In exchange, we ask for one 30-minute call a week and the right to cite you once you're comfortable. We'll only take the meeting if you're running agents in production today.

Design-partner intake — 30 min with Adam Shibli Cal.com · 30 min
April 2026
MTWTFSS
Available slotsApr 28

Pick a time slot above so we can pre-fill it for you, then fill out the form.