Skip to main content

Evaluation Deep Dive

How SigmaShake evaluates a tool call against governance rules.

Algorithm

evaluate(rules: Rule[], call: ToolCall): EvalResult
  1. Sort rules by priority descending (100 before 50)
  2. Filter to enabled rules only
  3. For each rule in sorted order: a. Check targetMatches(rule.target, call.tool) — does the rule apply to this tool type? b. Check groupMatches(rule.groups, call) — do any condition groups match? c. If both match → return immediately (short-circuit, first-match wins)
  4. Default: return {decision: "allow"} if no rule matches

Target matching

A rule's target is compared against the tool's capability:

DENY execution  →  only matches tools classified as "execute" (Bash, shell)
DENY any → matches all tools

The capability is determined by the active client adapter, which maps tool names to the 6-capability taxonomy (execute, read, write, search, agent, network).

Condition evaluation

Groups (OR logic)

Conditions separated by OR form separate groups. Any group matching is sufficient:

IF command CONTAINS "rm -rf"     ← Group 1
OR command CONTAINS "rm -r" ← Group 2

Group 1 matches OR Group 2 matches → rule fires.

Within a group (AND logic)

Conditions joined by AND (or on consecutive lines after IF) must all match:

IF path ENDS_WITH ".env"         ← Group 1, condition 1
AND content CONTAINS "API_KEY" ← Group 1, condition 2

Both must match for the group to match.

Field resolution

Fields are extracted from the ToolCall input:

function getField(field: Field, call: ToolCall): string {
if (field.startsWith('input.')) {
const key = field.slice(6);
const val = call.input[key];
if (val === undefined || val === null) return '';
return typeof val === 'string' ? val : JSON.stringify(val);
}
switch (field) {
case 'command': return call.input.command ?? '';
case 'path': return call.input.file_path ?? call.input.path ?? '';
case 'content': return call.input.content ?? call.input.new_string ?? '';
case 'tool': return call.tool;
default: return '';
}
}

Performance

MetricTypical value
Evaluation latency< 2ms
Rules loaded20-50
Glob cache sizeUp to 1000 patterns
Regex cache sizeUp to 1000 patterns

Safety features

Fail-secure regex

If a regex pattern is invalid or dangerous (nested quantifiers), it matches rather than silently failing. This triggers the rule, erring on the side of caution.

Loop guard

Detects when the same Bash command is repeated 3+ times consecutively and blocks it to prevent infinite loops.

Circuit breaker

After 5 consecutive deny decisions (in Claude Code hook mode), automatically allows the next call to prevent complete agent lockout. Resets on any non-deny decision.