Evaluation Deep Dive
How SigmaShake evaluates a tool call against governance rules.
Algorithm
evaluate(rules: Rule[], call: ToolCall): EvalResult
- Sort rules by priority descending (100 before 50)
- Filter to enabled rules only
- For each rule in sorted order:
a. Check
targetMatches(rule.target, call.tool)— does the rule apply to this tool type? b. CheckgroupMatches(rule.groups, call)— do any condition groups match? c. If both match → return immediately (short-circuit, first-match wins) - Default: return
{decision: "allow"}if no rule matches
Target matching
A rule's target is compared against the tool's capability:
DENY execution → only matches tools classified as "execute" (Bash, shell)
DENY any → matches all tools
The capability is determined by the active client adapter, which maps tool names to the 6-capability taxonomy (execute, read, write, search, agent, network).
Condition evaluation
Groups (OR logic)
Conditions separated by OR form separate groups. Any group matching is sufficient:
IF command CONTAINS "rm -rf" ← Group 1
OR command CONTAINS "rm -r" ← Group 2
Group 1 matches OR Group 2 matches → rule fires.
Within a group (AND logic)
Conditions joined by AND (or on consecutive lines after IF) must all match:
IF path ENDS_WITH ".env" ← Group 1, condition 1
AND content CONTAINS "API_KEY" ← Group 1, condition 2
Both must match for the group to match.
Field resolution
Fields are extracted from the ToolCall input:
function getField(field: Field, call: ToolCall): string {
if (field.startsWith('input.')) {
const key = field.slice(6);
const val = call.input[key];
if (val === undefined || val === null) return '';
return typeof val === 'string' ? val : JSON.stringify(val);
}
switch (field) {
case 'command': return call.input.command ?? '';
case 'path': return call.input.file_path ?? call.input.path ?? '';
case 'content': return call.input.content ?? call.input.new_string ?? '';
case 'tool': return call.tool;
default: return '';
}
}
Performance
| Metric | Typical value |
|---|---|
| Evaluation latency | < 2ms |
| Rules loaded | 20-50 |
| Glob cache size | Up to 1000 patterns |
| Regex cache size | Up to 1000 patterns |
Safety features
Fail-secure regex
If a regex pattern is invalid or dangerous (nested quantifiers), it matches rather than silently failing. This triggers the rule, erring on the side of caution.
Loop guard
Detects when the same Bash command is repeated 3+ times consecutively and blocks it to prevent infinite loops.
Circuit breaker
Disabled by default (Fails Closed). When explicitly enabled via SSG_HOOK_CIRCUIT_BREAKER=1 environment variable or circuit_breaker = true under the [hook] block in config.toml, the engine will automatically allow the next call after 5 consecutive deny decisions (in Claude Code hook mode) to prevent complete agent lockout. Resets on any non-deny decision.
ASK approval mode
When a rule returns an ASK decision, the engine needs a human to approve or deny before the tool call proceeds. Two modes are available:
| Mode | Behavior |
|---|---|
tty (default) | Prints an inline prompt to the terminal via /dev/tty — same technique as sudo/ssh. Works even when stdin is piped. Times out after 60 seconds (deny on timeout). |
dashboard | Posts the approval request to the ssg web dashboard. Requires ssg serve to be running. |
Configure per-project in .sigmashake/config.toml:
[hook]
ask_mode = "tty" # default — inline terminal prompt
# ask_mode = "dashboard" # opt-in: web dashboard approval
Override per-invocation with the --ask_mode flag:
ssg eval --ask_mode=dashboard
If /dev/tty is unavailable (headless CI), the TTY mode falls back gracefully with a block decision and a message on stderr.