Data Model
Logical Layers
Section titled “Logical Layers”clusters / dimensionsapi_resourcesobjectsversions + blobslatest_raw_indexlatest_indexobject_edgesobject_factsobject_changesingestion_offsetsmaintenance_runsDimensions
Section titled “Dimensions”Use numeric IDs internally to reduce repeated text in large fact and edge tables.
clusters(id, name, uid, source, created_at)api_resources( id, api_group, api_version, resource, kind, namespaced, preferred_version, storage_version, verbs)object_kinds(id, api_resource_id, api_group, api_version, kind)edge_types(id, name)fact_keys(id, family, key, value_type)resource_processing_profiles( api_resource_id, profile, retention_class, filter_chain, extractor_set, compaction_strategy, priority, max_event_buffer, enabled)api_resources is the discovery-backed GroupVersionResource table. It is the
authoritative mapping used by the collector and authorizer when they need the
Kubernetes group/resource/scope tuple for list/watch and SAR/SSAR checks.
object_kinds can remain as a compact dimension for query tables, but it must
link back to api_resources.
Required fields for authorization and watch management:
api_groupapi_versionresourcekindnamespacedverbsOptional discovery metadata:
preferred_versionstorage_versionlast_discovered_atremoved_atresource_processing_profiles stores per-resource behavior. The default profile
uses the generic history path. High-volume or high-value resources can use a
specialized profile without changing the logical storage contract.
The SQLite default backend writes a profile row during discovery and first
observation storage, then reports API resource count and stored version count by
profile in validation output.
Initial profiles:
genericpod_fast_pathnode_summaryevent_rollupendpointslice_topologylease_skip_or_downsamplesecret_metadata_onlyRequired default edge_types:
service_selects_podendpointslice_targets_podpod_owned_by_replicasetreplicaset_owned_by_deploymentpod_on_nodepod_uses_configmappod_uses_secretworkload_uses_pvcRequired default fact_keys:
pod_status.reasonpod_status.last_reasonpod_status.restart_countpod_status.readypod_status.phasepod_status.qos_classpod_placement.node_assignedpod_placement.startedpod_placement.deletedworkload_rollout.deployment_generationworkload_rollout.replicaset_hashworkload_config.imageworkload_config.memory_requestworkload_config.memory_limitworkload_config.cpu_requestworkload_config.cpu_limitworkload_config.probe_changednode_condition.Readynode_condition.MemoryPressurenode_condition.DiskPressurenode_condition.PIDPressurenode_status.taintnode_status.capacitynode_status.allocatablek8s_event.typek8s_event.reasonk8s_event.message_fingerprintk8s_event.message_previewk8s_event.actionk8s_event.reporting_controllerk8s_event.reporting_instancek8s_event.countk8s_event.series_countservice.typeservice.cluster_ipservice.load_balancer.pendingservice.load_balancer.ingress_countservice.load_balancer.ingress_ipservice.load_balancer.ingress_hostnameservice.deletedendpoint.readyendpoint.servingendpoint.terminatingendpoint.membershipObjects
Section titled “Objects”objects is the stable identity table.
objects( id, cluster_id, kind_id, namespace, name, uid, latest_version_id, first_seen_at, last_seen_at, deleted_at)Identity rules:
- Prefer Kubernetes
metadata.uidwhen available. - Use
cluster/kind/namespace/nameas the human-readable key. - Track delete/recreate as different objects if UID changes.
- For resources without UID, fall back to namespaced identity.
deleted_atis the time kube-insight observed a Kubernetes delete event for the object. It is not copied frommetadata.deletionTimestamp, which records Kubernetes graceful deletion intent before the object is actually removed.
Versions
Section titled “Versions”versions stores the reconstructable resource history.
versions( id, object_id, seq, observed_at, resource_version, generation, doc_hash, materialization, strategy, blob_ref, parent_version_id, raw_size, stored_size, replay_depth, summary)materialization values:
fullreverse_deltacdc_manifeststrategy values:
full_zstdjson_patch_zstdcdc_zstdblobs( digest, codec, raw_size, stored_size, data)The blob layer should be content-addressed. It can later move from SQL storage to object storage without changing the logical model.
Derived Evidence Reindex
Section titled “Derived Evidence Reindex”object_facts, object_edges, and object_changes are derived from retained
JSON versions. When extractor sets or resource profiles change, run
kube-insight db reindex to rebuild those derived rows from versions and
blobs without re-watching the cluster. The command is dry-run by default; use
--yes to apply changes in small object batches.
Latest Snapshots
Section titled “Latest Snapshots”Latest data is split into two query surfaces:
latest_raw_index/latest_raw_documents: latest observed sanitized cluster snapshot. This preserves runtime fields such asresourceVersion,generation, Event counters, and controller heartbeat values. Secret payload values are still redacted; key names can be retained.latest_index/latest_documents: latest retained history proof. This points at the newest normalizedversionsrow and can intentionally omit high-churn fields filtered before retained hashing.
latest_raw_index( object_id, cluster_id, kind_id, namespace, name, uid, observed_at, observation_type, resource_version, generation, doc_hash, raw_size, doc)
latest_index( object_id, cluster_id, kind_id, namespace, name, uid, latest_version_id, observed_at)Use latest_raw_documents when a human or agent needs the current observed
cluster shape. Use latest_documents when the question needs the latest
retained proof document. latest_index remains rebuildable from versions;
latest_raw_index is overwritten by future observations and is not historical
proof. Deleted objects are removed from latest_raw_index; delete history
remains available through observations and retained versions.
Historical Topology
Section titled “Historical Topology”object_edges stores time-valid graph edges.
object_edges( id, cluster_id, edge_type, src_id, dst_id, valid_from, valid_to, src_version_id, dst_version_id, confidence, detail)open_edges tracks currently active edges for efficient ingestion:
open_edges( cluster_id, edge_type, src_id, dst_id, edge_id)Only write edge rows when relationships change.
Troubleshooting Facts
Section titled “Troubleshooting Facts”object_facts stores queryable incident evidence.
object_facts( id, cluster_id, ts, object_id, version_id, kind_id, namespace, name, node_id, workload_id, service_id, fact_key_id, fact_value, numeric_value, severity, detail)Keep detail small. Full JSON belongs in versions.
Change Summary
Section titled “Change Summary”object_changes stores small timeline entries used by the UI and investigation
ranking.
object_changes( id, cluster_id, ts, object_id, version_id, change_family, path, op, old_scalar, new_scalar, severity)This is not the full diff. It is a query aid.
Ingestion Offsets
Section titled “Ingestion Offsets”ingestion_offsets( cluster_id, api_resource_id, namespace, resource_version, last_list_at, last_watch_at, last_bookmark_at, status, error, updated_at)For cluster-scoped resources, namespace is null. For namespaced resources,
namespace is null for an all-namespaces watch and set to the namespace name
for a namespace-scoped watch. Offsets let the collector resume and make gap
detection explicit.
The same table powers watch health:
kube-insight db resources health --errors-onlykube-insight db resources health --stale-after 5mHealth output is intended for humans, automation, and agents to decide whether an evidence answer is based on fresh complete watch data or a partial/stale resource stream.
Storage Maintenance
Section titled “Storage Maintenance”High-churn watch ingestion creates dead rows and index bloat in SQL backends. Maintenance policy is part of the data model, not an operational afterthought.
Track maintenance runs:
maintenance_runs( id, cluster_id, backend, task, started_at, finished_at, status, rows_scanned, rows_changed, bytes_before, bytes_after, error)Required tasks:
compact_versionscompact_edgespurge_retentionrebuild_derived_indexesvacuum_or_analyzeSQLite should run incremental vacuum when enabled, wal_checkpoint, and
ANALYZE after large ingestion or purge jobs. ClickHouse should report active
and inactive part footprint, compression ratio, and merge pressure during live
profiles. Future OLTP metadata backends such as PostgreSQL or CockroachDB should
use their native vacuum/analyze or bloat-management workflows if they are added.