tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

Improve metrics library to enforce best practices

Open lambdanis opened this issue 3 months ago • 0 comments

There were many changes made recently to improve how Prometheus metrics are defined and managed in Tetragon:

  • cleanup metrics for deleted pods to prevent growing cardinality: #1279
  • labels configuration for high-cardinality metrics: #1444 and follow-up refactorings #1548, #2321 and #2373
  • expose metrics directly from BPF maps: #1510 (and a few PRs using helpers introduced there)
  • initialize metrics with labels for predictable resources usage and easier queries: #2162
  • autogenerated metrics docs and grouping metrics by function: #2164
  • multiple fixes to individual metrics

Metrics now seem to be in a decent place. However, it's not intuitive for developers how to define them. Things like labels configuration, initialization and separate helpers for docs can be confusing.

The goal of this issue is to extend pkg/metrics library to provide an intuitive interface for defining metrics following best practices. Ideally we should also write dev docs and add metrics linting to CI.

lambdanis avatar Apr 25 '24 12:04 lambdanis