Active / flagship

kube-insight

The missing history layer for Kubernetes AIOps.

Logs have search systems. Metrics have time-series stores. Traces preserve application flows. Kubernetes infrastructure state is still too often reduced to whatever the apiserver shows right now. kube-insight turns that gap into an AIOps foundation: it records Kubernetes resource history at low operational cost, extracts facts and topology, and exposes human- and agent-friendly query surfaces. Agents can work from retained evidence first, then use live kubectl only for final confirmation instead of rebuilding context from scratch.

GitHub Docs

24-215 msfive retained-evidence agent workflow queries

14.9x-221xfaster than comparable broad kubectl paths

auto-redactionconfigurable filters and extractors keep sensitive data out of evidence

Why it matters

Current state is useful. It is not the whole story.

kubectl is still the live-state baseline, but many incidents are already gone by the time someone investigates: Events expire, rollouts are reverted, RBAC edits are fixed, EndpointSlices move, and Pods are replaced. kube-insight keeps the missing Kubernetes evidence and shapes it into fast, scoped investigation paths.

without historycurrent objects only

with kube-insightversions, facts, edges, observations

Keep the state that disappeared

Events expire, Pods restart, EndpointSlices change, and deleted objects vanish from the apiserver. kube-insight keeps observed versions and timestamps so the old state can still be inspected.

Turn raw history into queryable clues

Extracted facts, changes, and topology edges let operators and agents rank candidate Services, Pods, Events, owners, RBAC, webhooks, and policies before opening full JSON proof.

Reduce the agent blast radius

Configurable filters and extractors redact sensitive data before storage. Future service mode will inherit Kubernetes RBAC so agents see only what they are allowed to inspect.

Performance

Measured as investigation workflows, not isolated database tricks.

The validation compares retained evidence against broad live kubectl paths, then separates SQLite, ClickHouse, and chDB tradeoffs. The product claim is focused: pre-extracted evidence makes AIOps workflows faster and more repeatable before the final live-state check.

Validation profile

Evidence queries stay small because the joins are already shaped.

2026-05-18

agent query phase 24-215 ms

Five retained-evidence workflows over SQLite evidence.

raw kubectl baseline 3,104-5,745 ms

Comparable broad live calls reconstructing the same context.

live service case 448.746 ms vs 3,462.546 ms

ClickHouse SQL/API path used 3 operations; raw kubectl used 4 calls.

Agent workflow benchmark

Retained evidence vs broad live kubectl

14.9x-221x

Scenario kube-insight kubectl Speedup

PolicyViolation Event count 215 ms 3,214 ms 14.9x

Event to affected resource 26 ms 3,307 ms 127.2x

Event keyword search 24 ms 3,794 ms 158.1x

Service topology candidates 32 ms 3,104 ms 97.0x

Workload scope inventory 26 ms 5,745 ms 221.0x

Same-dataset storage harness

Choose by operating model, not a single latency number.

smallest local start

SQLite

ingest: 17.42 s
service: 80.6 ms
storage: 4.61 MB DB

central history

ClickHouse

ingest: 7.91 s
service: 182.0 ms
storage: 597 KiB active, ~4.9x

local ClickHouse shape

chDB

ingest: 1.52 s
service: 506.9 ms
storage: 1.23 MB dir, ~5.7x

Use cases

Actual investigation shapes from the project docs.

The website should show more than a capability list. These cases demonstrate how retained facts, edges, observations, and versions become practical incident evidence.

Expired events Service topology

Expired events

PolicyViolation events after the workload looks healthy

Symptom

A deployment was rejected or repeatedly reconciled with policy warnings. By the time someone investigates, the workload may look healthy and Events may have rotated out.

Why live kubectl is weak later

Events are short-lived and often rotated.
Warning Events must be joined back to Deployments, ReplicaSets, and Pods.
The policy controller may no longer list every affected object.

Evidence kube-insight uses

k8s_event.reason, type, and message facts
event edges to involved resources
Deployment, ReplicaSet, and Pod retained versions

01Check coverage

02Find warning Events

03Follow involved-object edges

04Open retained history

Query shape

where fact_key in ('k8s_event.reason', 'k8s_event.type') and (fact_value = 'Warning' or severity >= 60)

What you get

PolicyViolation warning Events tied back to workload objects, even when the current cluster no longer shows the full incident window.

Service topology

Service / EndpointSlice proof after resources changed

Symptom

A Service briefly routed to no endpoints or unready Pods. Later the Service is healthy, old Pods may be replaced, and the useful topology has moved on.

Why live kubectl is weak later

Current EndpointSlices only show current endpoints.
Deleted rollout objects and old Pods cannot be reconstructed from live state alone.
Pod readiness transitions and Events may no longer line up in one live query.

Evidence kube-insight uses

endpointslice_for_service edges
endpointslice_targets_pod edges
Endpoint readiness, Pod readiness, and restart facts
Service investigation bundle with proof versions

01Find Service facts

02Expand EndpointSlice edges

03Inspect Pod readiness

04Cross-check current kubectl

Query shape

endpointslice_for_service -> endpointslice_targets_pod -> Pod readiness facts -> retained versions

What you get

The investigation can show which historical EndpointSlices pointed at which Pods, then use kubectl only as the final live-state comparison.

Architecture

Facts and edges are the candidate path. Versions are the proof.

Kubernetes data is captured once, filtered before storage, extracted into investigation tables, then served through narrow read surfaces: CLI, HTTP API, read-only SQL, MCP tools, and agent prompts.

Architecture flow

Same shape as the project architecture: capture, filter, store, query.

read-only outputs

Kubernetes API

Discovery

List / Watch

kube-insight ingestion

Filters redact, normalize, discard

Retained versions content-addressed JSON

Evidence extraction facts, edges, changes

Evidence store

versions

facts

edges

observations

SQLite default / chDB local / ClickHouse central

Read surfaces

CLI

HTTP API

SQL

MCP tools + prompts

Investigations humans, scripts, and agents inspect scoped proof

Evidence model

Small tables, useful answers.

versionscontent-addressed retained JSON

factsstatus, events, RBAC, rollout, webhook, cert facts

edgesService, EndpointSlice, owner, policy, and event relationships

observationswatch/list timestamps and coverage signals

Storage modes

Start local. Keep history central when the team needs it.

default local smallest start

SQLite

A pure-Go default artifact with one local evidence database for first captures, laptops, CI fixtures, and local agent workflows.

local ClickHouse shape embedded analytics

chDB

A chDB-enabled artifact when you want ClickHouse-compatible local tables without operating a ClickHouse server.

central history team service

ClickHouse

A continuous evidence service for append-heavy history, compression, API/MCP reads, and future cold-tiering work.

Next steps

Start with the repository quickstart and storage notes.

Installation, MCP usage, SQL recipes, security, retention, and storage-mode tradeoffs are kept in project documentation so the website can stay focused on product shape.

Open GitHub Read Docs All projects