Skip to main content

Evaluation Deep Dive

How SigmaShake evaluates a tool call against governance rules.

Algorithm

evaluate(rules: Rule[], call: ToolCall): EvalResult
  1. Sort rules by priority descending (100 before 50)
  2. Filter to enabled rules only
  3. For each rule in sorted order: a. Check targetMatches(rule.target, call.tool) — does the rule apply to this tool type? b. Check groupMatches(rule.groups, call) — do any condition groups match? c. If both match → return immediately (short-circuit, first-match wins)
  4. Default: return {decision: "allow"} if no rule matches

Target matching

A rule's target is compared against the tool's capability:

DENY execution → only matches tools classified as "execute" (Bash, shell)
DENY any → matches all tools

The capability is determined by the active client adapter, which maps tool names to the 6-capability taxonomy (execute, read, write, search, agent, network).

Condition evaluation

Groups (OR logic)

Conditions separated by OR form separate groups. Any group matching is sufficient:

IF command CONTAINS "rm -rf" ← Group 1
OR command CONTAINS "rm -r" ← Group 2

Group 1 matches OR Group 2 matches → rule fires.

Within a group (AND logic)

Conditions joined by AND (or on consecutive lines after IF) must all match:

IF path ENDS_WITH ".env" ← Group 1, condition 1
AND content CONTAINS "API_KEY" ← Group 1, condition 2

Both must match for the group to match.

Field resolution

Fields are extracted from the ToolCall input:

function getField(field: Field, call: ToolCall): string {
if (field.startsWith('input.')) {
const key = field.slice(6);
const val = call.input[key];
if (val === undefined || val === null) return '';
return typeof val === 'string' ? val : JSON.stringify(val);
}
switch (field) {
case 'command': return call.input.command ?? '';
case 'path': return call.input.file_path ?? call.input.path ?? '';
case 'content': return call.input.content ?? call.input.new_string ?? '';
case 'tool': return call.tool;
default: return '';
}
}

Performance

MetricTypical value
Evaluation latency< 2ms
Rules loaded20-50
Glob cache sizeUp to 1000 patterns
Regex cache sizeUp to 1000 patterns

Safety features

Fail-secure regex

If a regex pattern is invalid or dangerous (nested quantifiers), it matches rather than silently failing. This triggers the rule, erring on the side of caution.

Loop guard

Detects when the same Bash command is repeated 3+ times consecutively and blocks it to prevent infinite loops.

Circuit breaker

Disabled by default (Fails Closed). When explicitly enabled via SSG_HOOK_CIRCUIT_BREAKER=1 environment variable or circuit_breaker = true under the [hook] block in config.toml, the engine will automatically allow the next call after 5 consecutive deny decisions (in Claude Code hook mode) to prevent complete agent lockout. Resets on any non-deny decision.

ASK approval mode

When a rule returns an ASK decision, the engine needs a human to approve or deny before the tool call proceeds. Two modes are available:

ModeBehavior
tty (default)Prints an inline prompt to the terminal via /dev/tty — same technique as sudo/ssh. Works even when stdin is piped. Times out after 60 seconds (deny on timeout).
dashboardPosts the approval request to the ssg web dashboard. Requires ssg serve to be running.

Configure per-project in .sigmashake/config.toml:

[hook]
ask_mode = "tty" # default — inline terminal prompt
# ask_mode = "dashboard" # opt-in: web dashboard approval

Override per-invocation with the --ask_mode flag:

ssg eval --ask_mode=dashboard

If /dev/tty is unavailable (headless CI), the TTY mode falls back gracefully with a block decision and a message on stderr.