tetragon
tetragon copied to clipboard
Define debug metrics group
Separate metrics monitoring Tetragon health (used by operators) from metrics exposing details useful for debugging (used mainly by Tetragon developers, potentially high-cardinality). The idea is to disable the latter by default, to reduce the default metrics cardinality and performance overhead.
See Tetragon metrics framework for more context.
- [ ] Define debug metrics group (unconstrained). See how health metrics group is defined: https://github.com/cilium/tetragon/blob/main/pkg/metricsconfig/healthmetrics.go
- [ ] Identify debug metrics within the health group and move them into debug group. This would probably include:
- metrics documented as "for internal use only"
- metrics with unconstrained cardinality, e.g. "kprobe" label
- any other metrics intended for Tetragon developers rather than operators
- [ ] Move debug metrics to a separate endpoint (breaking change)
- [ ] Disable debug metrics by default (breaking change)
- [ ] Adjust how metrics docs are generated
- [ ] Remove "For internal use only" annotation from the metrics help texts. The fact of being in the debug group indicates whether a metric is considered "internal".
After this is done, health metrics group should be marked as constrained.
Identified debug metrics
(not a complete list)
- tetragon_bpf_missed_events_total