SentinelCAT — Installation & Usage
This document describes how SentinelCAT is deployed, integrated, and used
across Tier 1, Tier 2, and Enterprise installations.
SentinelCAT is a deterministic failure-classification and containment engine.
It does not perform observability, telemetry collection, anomaly detection,
or automated remediation.
1. What SentinelCAT Is
- A single internal service that classifies infrastructure failure modes
- Consumes numeric and boolean signals computed by the customer
- Outputs a failure class, amplifiers, and rationale
- Designed to prevent outage amplification
2. What SentinelCAT Is Not
- Not an observability platform
- Not an alerting system
- Not an AI or learning system
- Not an auto-healing or remediation engine
SentinelCAT asserts containment posture. Humans decide actions.
3. Deployment Model (All Tiers)
- Single internal service (container or VM)
- No outbound network dependencies
- No agents installed on workloads
- No persistent data storage required
SentinelCAT is typically deployed alongside existing platform or SRE tooling.
Security note:
SentinelCAT does not require access to raw logs, metrics streams, or production credentials.
Only derived signals are provided.
4. Signal Ingestion
SentinelCAT consumes a JSON payload containing a snapshot of signals:
{
"signals": {
"cp_api_p95_ms": 1800,
"dp_request_success_rate": 0.995,
"retry_rate_per_req": 1.2,
"cpu_utilization": 0.94
}
}
Signals are computed by the customer from existing systems
(Prometheus, CloudWatch, logs, internal metrics).
Missing signals are permitted. Rules depending on missing signals are skipped.
5. Output Semantics
Each classification produces:
- Primary failure class
- Optional amplifiers (e.g., operator blindness)
- Matched rules
- Human-readable rationale
Output is deterministic. The same signals always produce the same result.
6. Tier 1 Installation — Structural Containment
Typical deployment time: hours
Tier 1 focuses on preventing outage amplification.
- Cascading resource exhaustion
- Retry storms
- Configuration fan-out
- Safety mechanism lockout
Integration steps:
- Deploy SentinelCAT service
- Define a minimal signal set
- Feed signals on a fixed interval
- Expose output to incident responders
7. Tier 2 Installation — Control & Dependency Integrity
Typical deployment time: 1–2 days
Tier 2 extends classification to platform-wide risks:
- Control-plane loss
- Shared dependency collapse (auth/DNS)
- Hidden transitive dependencies
- Time skew amplification
Additional integration includes:
- Cross-service dependency signals
- Control-plane health signals
- Time and lease integrity signals
8. Enterprise Installation — Platform-Specific Integration
Typical deployment time: multiple days
Enterprise deployments address complex, bespoke platforms:
- Multi-region control planes
- Custom orchestration layers
- Internal identity and access systems
- Non-standard recovery automation
Enterprise installations may include:
- Custom signal definitions
- Authority inconsistency detection
- Platform-specific invariants
9. Operator Responsibilities
- Define signal computation correctly
- Feed signals on a consistent interval
- Interpret containment assertions
- Decide and execute remediation actions
Important:
SentinelCAT does not take action on your infrastructure.
Containment decisions remain human-controlled by design.
10. Versioning & Stability
- Failure class ontology is stable within a major version
- Signal contract changes are versioned explicitly
- No silent behavior changes
11. Signal Adapter Configuration (Required)
SentinelCAT is not configured per application or service.
It is configured once per environment via a Signal Adapter.
The Signal Adapter is a small customer-owned job or service that:
- Queries existing telemetry systems
- Computes derived numeric and boolean signals
- Submits a signal snapshot to SentinelCAT on a fixed interval
This adapter is the only required integration point.
11.1 Where the Signal Adapter Runs
The Signal Adapter typically runs as one of the following:
- Kubernetes CronJob or Deployment
- Internal platform service
- VM-based scheduled job
- Existing reliability or SRE automation process
It must be able to reach SentinelCAT over the internal network.
11.2 What the Signal Adapter Reads
The adapter reads from systems you already operate, such as:
- Prometheus or compatible metrics stores
- CloudWatch or cloud-native metrics
- Internal telemetry pipelines
- Derived aggregates from logs or traces
SentinelCAT does not ingest raw telemetry. Only derived signals are provided.
11.3 Signal Computation
Signals are computed over consistent rolling windows, for example:
- 1–5 minutes for rates and latency
- 15–60 minutes for drift, expiry, or stability checks
Each signal represents an invariant-relevant condition, such as:
- Control-plane latency percentiles
- Retry amplification ratios
- Configuration propagation rates
- Resource saturation indicators
- Boolean assertions (e.g. rollback possible, admin actions blocked)
11.4 Payload Format
The adapter sends a JSON payload to SentinelCAT:
{
"signals": {
"cp_api_p95_ms": 1800,
"cp_api_error_rate": 0.08,
"dp_request_success_rate": 0.995,
"retry_rate_per_req": 1.2,
"cpu_utilization": 0.94,
"rollback_possible": false
}
}
Missing signals are allowed.
Rules depending on missing signals are skipped deterministically.
11.5 Submission Frequency
Typical submission intervals are:
- 30 seconds
- 60 seconds
- 120 seconds
Intervals should be fixed and predictable.
SentinelCAT does not require real-time streaming.
11.6 Example Adapter (Pseudo-Code)
signals = {
"cp_api_p95_ms": query_metrics("control_plane_latency_p95"),
"retry_rate_per_req": compute_retry_rate(),
"cpu_utilization": cluster_cpu_max(),
"config_push_rate_per_min": config_push_velocity(),
}
payload = { "signals": signals }
POST http://sentinelcat.internal/api/classify
11.7 Handling Classification Output
SentinelCAT returns:
- Primary failure class
- Optional amplifiers
- Matched rules
- Rationale
The Signal Adapter or downstream tooling routes this output to humans:
- Incident response channels
- Runbooks
- Operational dashboards
- Ticketing or escalation systems
SentinelCAT does not trigger actions directly.
Design constraint:
SentinelCAT intentionally does not perform automated remediation.
Containment decisions remain human-controlled.
— End of document —