Why SigmaShake

You've shipped an AI agent. It's fast, capable, and completely unguarded. The moment it touches your codebase, your infrastructure, or your CI pipeline — you're trusting it to make the right call every time.

That trust is misplaced.

The problem with AI agents in production

Modern AI agents don't just generate text. They run shell commands. They edit files. They make network calls. They spawn sub-agents that do all of the above. And they do it all autonomously, at machine speed, in response to natural language instructions that were never meant to be airtight specifications.

You need enforcement that operates where the damage happens: at the tool call boundary, before the command runs.

Why the existing tools aren't enough

"We have a linter"

Linters are brilliant. ESLint, Semgrep, Bandit — they catch bad patterns before code ships. But they run on files you've written, not on commands your agent is about to execute.

When an agent runs git push --force or reads your .env file, there's no file to lint. The action is live, the content is ephemeral, and by the time your next CI run completes, the damage is already done.

	Linters	SigmaShake
Catches code patterns	✓	✓
Catches runtime agent actions	✗	✓
Evaluates live tool inputs	✗	✓
Works without a file on disk	✗	✓
sub-2 ms latency	✓	✓

Linters are pre-flight. SigmaShake is the flight controller.

"We run agents in a sandbox (e.g., NVIDIA OpenShell)"

Infrastructure-level sandboxes (like NVIDIA OpenShell) sound solid until you price them in. OpenShell relies on a heavy stack—running a Kubernetes (K3s) cluster inside a Docker container. This adds massive cold-start overhead, containerization bloat, and networking latency to every single invocation. Your agent just became 100× slower. Your developer just started ignoring the guardrails because the environment is too heavy to run locally.

But the deeper issue is that a sandbox is a cage, not a guide.

Sandboxes and containers don't know anything about intent or context. They rely on passive, complex YAML policies to block network egress or restrict file mounts. When an agent hits a sandbox wall, it simply fails—usually with an opaque "Connection Refused" or "Permission Denied" error. The agent gets stuck in a loop, repeatedly trying the same failing operation because it doesn't understand why it was blocked or what it should do instead.

Furthermore, if the project directory is mounted (as it usually must be for the agent to be useful), the sandbox will still happily let an agent run rm -rf /project/migrations or git push --force inside the container. The damage is still real.

Sandboxes answer the question: can this process see the network or file system?

SigmaShake answers the question: should this specific command run, and if not, how do we correct the agent?

SigmaShake enforces rules instantly at the tool boundary via a zero-dependency binary (installable via the one-line curl installer at install.sigmashake.com). Because it operates at the semantic layer, a DENY or FORCE rule can return a human-readable MESSAGE directly to the agent (e.g., "Use git push --force-with-lease instead"). The agent reads the error, learns the convention, and pivots to the correct behavior without getting stuck—no Kubernetes clusters required.

	Containers / Sandboxes	SigmaShake
Latency per call	80–300ms	sub-2 ms
Infrastructure to maintain	Dockerfile, registry, orchestration	Single binary
Understands rule intent	✗	✓
Can allow some Bash but block specific patterns	✗	✓
Audit log of every decision	✗	✓
Works on the developer's local machine	With effort	Drop-in, one command

Containers protect the machine. SigmaShake protects the project.

"We use a system prompt"

Asking the model to not do bad things works right up until it doesn't. System prompts are suggestions, not contracts. They're processed through the same probabilistic machinery that sometimes generates subtly wrong code, occasionally misreads ambiguous instructions, and reliably forgets nuances buried six paragraphs back.

Model-layer safety is also invisible. You can't audit what the model decided to allow or deny. You can't see a log entry that says "the agent tried to read .env.production and the rule fired." You find out about failures when something breaks.

LLM-based guardrails also consume tokens — sometimes thousands — on every call just to re-evaluate context that a rule engine resolves in microseconds.

	System prompts / LLM guardrails	SigmaShake
Deterministic	✗	✓
Audit log	✗	✓
Compute cost per evaluation	High	Minimal
Latency	200ms–2s	sub-2 ms
Can be jailbroken	Yes	No
Rules readable by non-ML engineers	✗	✓
Version-controlled, diffable policy	✗	✓

System prompts are advice. SigmaShake is a gate.

What SigmaShake actually does

SigmaShake sits between your AI agent and every tool it calls. Before Bash("rm -rf dist/") runs, before Read(".env.production") executes, before WebFetch("https://...") fires — SigmaShake evaluates the call against your ruleset and returns a decision in under 2 milliseconds.

No containers. No external infrastructure.

Agent → Tool Call → ssg eval → Decision → (allow / block / ask / force / log / shadow)
                        ↑
                    Your rules

Six decisions: block, guide, observe, escalate

Most enforcement systems give you allow or block. SigmaShake gives you six:

Decision	What happens
`ALLOW`	Proceeds immediately
`DENY`	Blocked with a reason shown to the agent
`ASK`	Paused — requires your approval in the dashboard
`FORCE`	Blocked and returned an error prompting the agent to use a safer substitute
`LOG`	Allowed, but permanently recorded
`SHADOW`	Silently observed — the agent never knows

ASK is the one nobody else has. When an agent tries to do something that might be legitimate but needs a second pair of eyes, SigmaShake parks it, pings your dashboard, and waits. You see the full tool call, decide yes or no, and the agent continues — with the decision recorded in the audit log forever.

Guide the agent, don't just block it

Blocking is the floor. The ceiling is teaching.

The FORCE decision rejects the original call and delivers an instruction directly back to the agent as a tool error. The agent reads the reason, pivots to the correct tool, and generates a new, compliant tool call. No lost context. No human intervention.

rule use-read-not-cat {
  enabled true
  FORCE execution
  IF command STARTS_WITH "cat "
  MESSAGE "Use the Read tool instead. It integrates with the audit log and handles large files safely."
  SUBSTITUTE "Read"
}

The moment the agent tries cat package.json, it receives the message and picks up Read on the next attempt. You've changed agent behavior — permanently, for everyone on the team — with six lines of config.

The same pattern steers agents toward correct workflows:

rule enforce-force-with-lease {
  enabled true
  FORCE execution
  IF command CONTAINS "git push --force"
  MESSAGE "Use git push --force-with-lease instead. It prevents overwriting concurrent pushes."
}

rule require-pinned-versions {
  enabled true
  FORCE execution
  IF command REGEX "npm install [^@\\-]"
  MESSAGE "Pin the version explicitly: npm install package@x.y.z"
}

Every DENY rule that adds a MESSAGE is a coaching moment. The agent learns your team's conventions natively in its tool-use loop without requiring you to update massive policy prompt documents.

Rules that read like what they enforce

rule block-force-push {
  enabled true
  priority 100
  severity error
  DENY execution
  IF command CONTAINS "git push --force"
  OR command CONTAINS "git push -f"
  MESSAGE "Force push is not allowed. Use --force-with-lease instead."
}

This is the whole rule. No YAML nesting, no JSON escaping, no plugin configuration. Your entire team can read it, review it in a PR, and understand exactly what it enforces. The .rules files live in .sigmashake/rules/ — version-controlled alongside the code they protect.

Hot reload — every single eval

Rules are reloaded on every evaluation. The moment you git pull a rules update, it applies to the next tool call. No restarts, no deployments, no cache invalidation incidents.

Start in five minutes, not five sprints

curl -fsSL https://install.sigmashake.com | sh
ssg init --client claude-code

Two commands. That's the integration. SigmaShake installs a PreToolUse hook into Claude Code and auto-configures permissions. The starter ruleset that ships with ssg init covers the most common production incidents out of the box — destructive commands, secret file access, force pushes, debug artifacts in source.

Adopt without risk — then tighten when ready

Starting small is the right instinct. SigmaShake is designed for it.

Every rule has a decision that you can change at any time, and rules reload on every evaluation — no restarts, no deployments, no cache warming. The moment you save a change, it applies to the next tool call.

This makes the rollout path zero-risk by default.

The progression: observe → gate → enforce

# Step 1 — Observe (nothing is blocked)
rule observe-force-push {
  enabled true
  LOG execution
  IF command CONTAINS "git push --force"
  MESSAGE "Force push observed."
}

Run in LOG mode for a few days. Watch the audit log. Understand exactly which tool calls trigger the rule and whether they're legitimate.

# Step 2 — Gate (humans decide)
rule gate-force-push {
  enabled true
  ASK execution
  IF command CONTAINS "git push --force"
  MESSAGE "Force push to remote — approve?"
  PROMPT "Allow this push to proceed?"
}

Promote to ASK when you're confident in the pattern. Approvals land in the dashboard. You stay in control without slowing the agent down on legitimate work.

# Step 3 — Enforce (confident)
rule block-force-push {
  enabled true
  DENY execution
  IF command CONTAINS "git push --force"
  MESSAGE "Force push is not allowed. Use --force-with-lease instead."
}

Switch to DENY when you've seen enough to know the rule is right. If it causes false positives, roll it back to ASK in seconds. One line change, one file save.

Start from a community ruleset

You don't have to write rules from scratch. Browse pre-built rulesets for TypeScript, Go, React, Kubernetes, Terraform, and more at hub.sigmashake.com. Pull a ruleset into your project with:

ssg hub pull <ruleset-id>

All community rules are pulled into your .sigmashake/rules/ directory as plain .rules files. Read them, edit them, delete the ones that don't fit. They're yours — not a black box you opt into forever.

If you create rules your team finds useful, publish them back:

ssg publish

One command creates a GitHub repo, pushes your rules, and submits them to the Hub for others to discover.

The comparison in one table

	Linters	Sandboxes	LLM guardrails	SigmaShake
Operates at runtime	✗	✓	✓	✓
sub-2 ms latency	✓	✗	✗	✓
Zero token cost	✓	✓	✗	✓
Deterministic decisions	✓	✓	✗	✓
Guides agent to correct tool	✗	✗	Sometimes	✓
Human-readable policy	✓	—	✗	✓
Version-controlled rules	✓	—	✗	✓
Full audit log	✗	✗	✗	✓
Human-in-the-loop approvals	✗	✗	✗	✓
LOG → ASK → DENY rollout path	✗	✗	✗	✓
Community rule library	✗	✗	✗	✓
Zero infra to operate	✓	✗	✓	✓
Works on localhost	✓	With effort	✓	✓

Who's already using this approach

Any team that has shipped an AI coding agent into a real codebase has discovered the same failure modes: the agent runs something it shouldn't, there's no record of it, and the fix is manual. SigmaShake was built by engineers who hit those walls and decided the answer wasn't more prompt engineering — it was a proper enforcement layer.

The engineering teams getting the most out of SigmaShake are the ones who treat their AI agents like they treat any other automated system: with clear policies, observable behavior, and a human escalation path when the situation calls for it.

Ready to try it?

curl -fsSL https://install.sigmashake.com | sh
ssg init --client claude-code

Or browse pre-built rulesets for your stack at hub.sigmashake.com.

→ Getting Started — full setup in under 5 minutes
→ Writing Rules — craft rules that match your team's workflow
→ Claude Code Integration — deep dive on the hook system

The problem with AI agents in production​

Why the existing tools aren't enough​

"We have a linter"​

"We run agents in a sandbox (e.g., NVIDIA OpenShell)"​

"We use a system prompt"​

What SigmaShake actually does​

Six decisions: block, guide, observe, escalate​

Guide the agent, don't just block it​

Rules that read like what they enforce​

Hot reload — every single eval​

Start in five minutes, not five sprints​

Adopt without risk — then tighten when ready​

The progression: observe → gate → enforce​

Start from a community ruleset​

The comparison in one table​

Who's already using this approach​

Ready to try it?​