Prompt injection is a confused deputy
An email, a Jira ticket, a scraped page — any input the agent reads is a potential instruction. Your WAF doesn't speak natural language.
Every agent action, on the record.
Warden is a control plane for AI agents in production. Every tool call is semantically inspected, policy‑checked, human‑approved when dangerous, and hash‑chained into a forensic ledger your auditor can replay.
Four pathologies we see across design‑partner deployments. None map cleanly to existing web‑application or network controls.
An email, a Jira ticket, a scraped page — any input the agent reads is a potential instruction. Your WAF doesn't speak natural language.
API keys, OAuth tokens, DB passwords — pasted into prompts, stored in vector DBs, exfiltrated by a single crafted document.
"What did the agent do, why, and on whose behalf?" If the answer lives in scattered LLM logs, it isn't an answer.
By the time a human notices the agent is wiring funds or dropping tables, the action has already cleared upstream.
An agent with a hallucinated tool call and a real API key is indistinguishable from a malicious insider — except faster.
Every tool call traverses the pipeline before any upstream side effect. Either layer can veto. Every veto is signed, hashed, chained.
mTLS ingress · Vault credential injection · coordinator
Agents authenticate with client certs. Real credentials never touch the agent — the proxy injects them into upstream calls only after the verdict resolves.
Three‑signal semantic eval
The inspector is a different model from your agent's primary LLM. Compromising one doesn't compromise the other.
Pure‑Rust Rego · velocity circuit breaker
Your existing Rego policies, evaluated in‑process. Per‑agent velocity tracker (in‑memory or NATS‑KV for multi‑instance) catches runaway loops and credential‑harvesting bursts.
Human approvals for the dangerous bits
Yellow‑tier tools (wires, prod writes, mass emails) park as Pending. Approvers click Approve in Slack or Teams. Expired requests fail closed. The agent waits.
SHA‑256 hash‑chained forensic store
Every verdict, every approval, every upstream outcome — written in canonical JSON, chained to the prior entry. Tamper a row and /verify tells you which one. Cold‑tier export ships signed Parquet manifests to S3 for seven‑year retention.
Indirect injection, a yellow‑tier wire transfer with human approval, and a runaway loop hitting the velocity breaker. Auto‑plays start to finish, or step through with explanations.
finbot‑prod‑7Your visit gets a scoped session. Actions land in the live chain.
These are real, hash‑chained ledger rows produced by warden‑ledger's append_entry against a sentinel correlation prefix. Each entry_hash commits to its predecessor in canonical JSON. Tamper a single byte and the chain stops verifying — you can prove that yourself, right now, without leaving this page.
# Live week 3 — demo backend goes online with the VPS. $ curl -s https://demo.vanteguardlabs.com/verify { "valid": true, "entries_checked": 5, "first_invalid_seq": null } # Replay this exact request bundle by correlation prefix: $ curl -s 'https://demo.vanteguardlabs.com/audit?prefix=demo-sentinel-' \ | jq '.entries[] | {seq, method, authorized, entry_hash}'
prev_hash with 64 zeros. Row 1 commits to this seed.entry_hash[n] = sha256(prev_hash[n] || "|" || canonical_json(hashable[n])). The hashable shape and field order are the chain version — see warden‑ledger/src/lib.rs:386.prev_hash equals row N's entry_hash. Tampering any earlier row breaks every later hash.Verify button recomputes each entry_hash with WebCrypto SHA‑256, byte‑identical to what verify_chain does on the server.AI gateways add caching and retries. Prompt firewalls scan inputs and call it a day. Logging stacks tell you what happened, never what should have. Warden does all four jobs — inspect, decide, approve, prove — on the same hot path, signed into the same chain.
| Capability | Warden | AI gateways Portkey‑class |
Prompt firewalls input scanners |
DIY logging roll your own |
|---|---|---|---|---|
| Semantic inspection intent + drift + injection |
Three signals, in‑line | none | input string only | none |
| Cryptographic, replayable audit trail | SHA‑256 + /verify |
append‑only logs | none | whatever you ship |
| Human approvals on dangerous tool calls | Slack & Teams, fail‑closed | none | none | Slack DMs & hope |
| Credentials never touch the agent | Vault‑injected at proxy | agent holds the keys | agent holds the keys | agent holds the keys |
| Multi‑instance velocity breaker | NATS‑KV, CAS‑correct | per‑instance only | none | none |
| Tail‑latency p95 verdict | <180 ms | Python‑bound, varies | 200–500 ms | untracked |
| Open‑source, wire‑compatible OSS edition | Warden Lite (Apache‑2.0) | SaaS only | SaaS only | it's all yours |
| Native red‑team test suite | 11 attack classes, nightly | none | vendor benchmark | none |
| Stack | Rust end‑to‑end | Python proxy + Node UI | Python | heterogeneous |
Every entry commits to its predecessor in canonical JSON; the field order is the chain version. Auditors don't get a vendor deck — they get a deterministic replay and a single endpoint that says tampered=false.
We had a faster racing architecture in early 2026 and ripped it out the moment we found a side‑effect window for Yellow‑tier tools. Competitors will discover this constraint the way we did — in production. We already paid that bill.
The inspector is deliberately a different LLM from your agent's primary model. A jailbreak that fools the agent doesn't automatically fool the warden.
No GIL contention. No cold‑start CPython. Predictable tail latency on both the verdict and the upstream roundtrip — the kind that survives an SRE's first p99 query.
"But our LLM gateway already logs requests."
A logging stack tells you what happened. Warden tells you what shouldn't happen, blocks it before it does, and then logs it — into a chain you can prove.
A ten‑minute audit, no install
A CLI that audits your GitHub orgs, Slack workspaces and laptops for unauthorized agents and leaked agent credentials. Most teams find between three and forty before the first coffee.
.env, configsOpen source, single binary
The whole stack as one Rust binary — heuristic brain, Rego policy, hash‑chain ledger, proxy. Drop it in front of a single agent. Wire‑compatible with the full edition.
The five‑layer control plane
mTLS, Vault, Brain, Policy, HIL, Ledger, cold‑tier export. Multi‑instance velocity tracker. Slack and Teams approver cards. Audit replay by correlation ID.
/verifyEvery entry in the ledger commits to its predecessor. Tamper a single byte and /verify tells you exactly which row broke the chain — and which entries came after it.
// hash chain genesis = 64 × "0" entry_hash[n] = sha256( prev_hash[n] || "|" || canonical_json(hashable[n]) ) // reconstructing a request GET /audit/correlation/{uuid} → [ proxy verdict, policy verdict, HIL transitions, upstream outcome ]
"We caught a prompt‑injection chain on day one that none of our existing tooling would have flagged. The fact that the verdict, the input, and the approver are all on a hashed chain is what closed the security review."
"Warden Lite let us put a real gateway in front of our research agents without standing up another service. Same chain format means we can graduate to the full edition without rewriting our audit tooling."
"The HIL approval flow is the only reason we got sign‑off to give an agent write access to the ledger system. The Slack card with the diff is what sold finance."
* Anonymized while design‑partner agreements are in force.
Book a 20‑minute demo. Bring one agent. We'll show you what it's actually doing.
In the meantime: try Warden Lite or run the Shadow Scanner.