SentinelCAT — Installation & Usage

This document describes how SentinelCAT is deployed, integrated, and used across Tier 1, Tier 2, and Enterprise installations.

SentinelCAT is a deterministic failure-classification and containment engine. It does not perform observability, telemetry collection, anomaly detection, or automated remediation.

1. What SentinelCAT Is

A single internal service that classifies infrastructure failure modes
Consumes numeric and boolean signals computed by the customer
Outputs a failure class, amplifiers, and rationale
Designed to prevent outage amplification

2. What SentinelCAT Is Not

Not an observability platform
Not an alerting system
Not an AI or learning system
Not an auto-healing or remediation engine

SentinelCAT asserts containment posture. Humans decide actions.

3. Deployment Model (All Tiers)

Single internal service (container or VM)
No outbound network dependencies
No agents installed on workloads
No persistent data storage required

SentinelCAT is typically deployed alongside existing platform or SRE tooling.

Security note:
SentinelCAT does not require access to raw logs, metrics streams, or production credentials. Only derived signals are provided.

4. Signal Ingestion

SentinelCAT consumes a JSON payload containing a snapshot of signals:

{
  "signals": {
    "cp_api_p95_ms": 1800,
    "dp_request_success_rate": 0.995,
    "retry_rate_per_req": 1.2,
    "cpu_utilization": 0.94
  }
}

Signals are computed by the customer from existing systems (Prometheus, CloudWatch, logs, internal metrics).

Missing signals are permitted. Rules depending on missing signals are skipped.

5. Output Semantics

Each classification produces:

Primary failure class
Optional amplifiers (e.g., operator blindness)
Matched rules
Human-readable rationale

Output is deterministic. The same signals always produce the same result.

6. Tier 1 Installation — Structural Containment

Typical deployment time: hours

Tier 1 focuses on preventing outage amplification.

Cascading resource exhaustion
Retry storms
Configuration fan-out
Safety mechanism lockout

Integration steps:

Deploy SentinelCAT service
Define a minimal signal set
Feed signals on a fixed interval
Expose output to incident responders

7. Tier 2 Installation — Control & Dependency Integrity

Typical deployment time: 1–2 days

Tier 2 extends classification to platform-wide risks:

Control-plane loss
Shared dependency collapse (auth/DNS)
Hidden transitive dependencies
Time skew amplification

Additional integration includes:

Cross-service dependency signals
Control-plane health signals
Time and lease integrity signals

8. Enterprise Installation — Platform-Specific Integration

Typical deployment time: multiple days

Enterprise deployments address complex, bespoke platforms:

Multi-region control planes
Custom orchestration layers
Internal identity and access systems
Non-standard recovery automation

Enterprise installations may include:

Custom signal definitions
Authority inconsistency detection
Platform-specific invariants

9. Operator Responsibilities

Define signal computation correctly
Feed signals on a consistent interval
Interpret containment assertions
Decide and execute remediation actions

Important:
SentinelCAT does not take action on your infrastructure. Containment decisions remain human-controlled by design.

10. Versioning & Stability

Failure class ontology is stable within a major version
Signal contract changes are versioned explicitly
No silent behavior changes

11. Signal Adapter Configuration (Required)

SentinelCAT is not configured per application or service. It is configured once per environment via a Signal Adapter.

The Signal Adapter is a small customer-owned job or service that:

Queries existing telemetry systems
Computes derived numeric and boolean signals
Submits a signal snapshot to SentinelCAT on a fixed interval

This adapter is the only required integration point.

11.1 Where the Signal Adapter Runs

The Signal Adapter typically runs as one of the following:

Kubernetes CronJob or Deployment
Internal platform service
VM-based scheduled job
Existing reliability or SRE automation process

It must be able to reach SentinelCAT over the internal network.

11.2 What the Signal Adapter Reads

The adapter reads from systems you already operate, such as:

Prometheus or compatible metrics stores
CloudWatch or cloud-native metrics
Internal telemetry pipelines
Derived aggregates from logs or traces

SentinelCAT does not ingest raw telemetry. Only derived signals are provided.

11.3 Signal Computation

Signals are computed over consistent rolling windows, for example:

1–5 minutes for rates and latency
15–60 minutes for drift, expiry, or stability checks

Each signal represents an invariant-relevant condition, such as:

Control-plane latency percentiles
Retry amplification ratios
Configuration propagation rates
Resource saturation indicators
Boolean assertions (e.g. rollback possible, admin actions blocked)

11.4 Payload Format

The adapter sends a JSON payload to SentinelCAT:

{
  "signals": {
    "cp_api_p95_ms": 1800,
    "cp_api_error_rate": 0.08,
    "dp_request_success_rate": 0.995,
    "retry_rate_per_req": 1.2,
    "cpu_utilization": 0.94,
    "rollback_possible": false
  }
}

Missing signals are allowed. Rules depending on missing signals are skipped deterministically.

11.5 Submission Frequency

Typical submission intervals are:

30 seconds
60 seconds
120 seconds

Intervals should be fixed and predictable. SentinelCAT does not require real-time streaming.

11.6 Example Adapter (Pseudo-Code)

signals = {
  "cp_api_p95_ms": query_metrics("control_plane_latency_p95"),
  "retry_rate_per_req": compute_retry_rate(),
  "cpu_utilization": cluster_cpu_max(),
  "config_push_rate_per_min": config_push_velocity(),
}

payload = { "signals": signals }

POST http://sentinelcat.internal/api/classify

11.7 Handling Classification Output

SentinelCAT returns:

Primary failure class
Optional amplifiers
Matched rules
Rationale

The Signal Adapter or downstream tooling routes this output to humans:

Incident response channels
Runbooks
Operational dashboards
Ticketing or escalation systems

SentinelCAT does not trigger actions directly.

Design constraint:
SentinelCAT intentionally does not perform automated remediation. Containment decisions remain human-controlled.