Executive Summary
The stateful observation window mechanism prevents false-positive remediation by requiring anomalies to persist across multiple consecutive inspection cycles before triggering action. Per-rule, per-node trigger counters track sustained anomalies while filtering transient spikes. A critical exception exists for security rules that bypass the observation window entirely.
Observation Window Mechanics
| Parameter | Value | Purpose |
|---|---|---|
| Inspection cycle | 60 seconds | Metric collection and rule evaluation interval |
| Required consecutive triggers | 3 cycles (~3 minutes) | Sustained anomaly before remediation |
| Counter reset | On any non-triggering cycle | Transient spike filtering |
| Security exception | SEC_DIGEST_MISMATCH | Fires immediately, bypasses window |
A rule must trigger on 3 consecutive 60-second inspection cycles (approximately 3 minutes of sustained anomaly) before generating a remediation recommendation. If any cycle does not trigger the rule, the counter resets to zero.
Counter State Machine
// Per-rule, per-node state
state = {
rule_id: "PERF_CPU_HIGH",
node_id: "validator-us",
consecutive_triggers: 0, // 0, 1, 2, or 3+
last_evaluated: timestamp,
fired: false
}
// Each inspection cycle:
if rule_triggers(node):
state.consecutive_triggers += 1
if state.consecutive_triggers >= 3:
emit_remediation_recommendation()
state.fired = true
else:
state.consecutive_triggers = 0 // RESET
state.fired = false
Security Exception
The SEC_DIGEST_MISMATCH rule (image tampering detection) is the sole exception to the observation window requirement. When a digest mismatch is detected, the remediation recommendation fires immediately on the first detection without waiting for consecutive triggers.
Benefits
- False positive reduction: Network blips, brief CPU spikes, and momentary latency increases are filtered out by the 3-cycle requirement
- Cascade prevention: Transient issues don't trigger unnecessary restarts that could themselves cause outages
- Security responsiveness: Real threats (code tampering) are still addressed immediately
- Observability: Counter states are visible in the fleet dashboard for operational insight