Evaluation Deep Dive
How SigmaShake evaluates a tool call against governance rules.
Algorithm
evaluate(rules: Rule[], call: ToolCall): EvalResult
- Sort rules by priority descending (100 before 50)
- Filter to enabled rules only
- For each rule in sorted order:
a. Check
targetMatches(rule.target, call.tool)— does the rule apply to this tool type? b. CheckgroupMatches(rule.groups, call)— do any condition groups match? c. If both match → return immediately (short-circuit, first-match wins) - Default: return
{decision: "allow"}if no rule matches
Target matching
A rule's target is compared against the tool's capability:
DENY execution → only matches tools classified as "execute" (Bash, shell)
DENY any → matches all tools
The capability is determined by the active client adapter, which maps tool names to the 6-capability taxonomy (execute, read, write, search, agent, network).
Condition evaluation
Groups (OR logic)
Conditions separated by OR form separate groups. Any group matching is sufficient:
IF command CONTAINS "rm -rf" ← Group 1
OR command CONTAINS "rm -r" ← Group 2
Group 1 matches OR Group 2 matches → rule fires.
Within a group (AND logic)
Conditions joined by AND (or on consecutive lines after IF) must all match:
IF path ENDS_WITH ".env" ← Group 1, condition 1
AND content CONTAINS "API_KEY" ← Group 1, condition 2
Both must match for the group to match.
Field resolution
Fields are extracted from the ToolCall input:
function getField(field: Field, call: ToolCall): string {
if (field.startsWith('input.')) {
const key = field.slice(6);
const val = call.input[key];
if (val === undefined || val === null) return '';
return typeof val === 'string' ? val : JSON.stringify(val);
}
switch (field) {
case 'command': return call.input.command ?? '';
case 'path': return call.input.file_path ?? call.input.path ?? '';
case 'content': return call.input.content ?? call.input.new_string ?? '';
case 'tool': return call.tool;
default: return '';
}
}
Performance
| Metric | Typical value |
|---|---|
| Evaluation latency | < 2ms |
| Rules loaded | 20-50 |
| Glob cache size | Up to 1000 patterns |
| Regex cache size | Up to 1000 patterns |
Safety features
Fail-secure regex
If a regex pattern is invalid or dangerous (nested quantifiers), it matches rather than silently failing. This triggers the rule, erring on the side of caution.
Loop guard
Detects when the same Bash command is repeated 3+ times consecutively and blocks it to prevent infinite loops.
Circuit breaker
After 5 consecutive deny decisions (in Claude Code hook mode), automatically allows the next call to prevent complete agent lockout. Resets on any non-deny decision.