tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

Delete metrics with policy label on policy delete

Open lambdanis opened this issue 10 months ago • 3 comments

tetragon_policy_events_total metric has a policy label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).

There was a similar problem with metrics that have a pod label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).

We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.

lambdanis avatar Mar 25 '24 14:03 lambdanis

Hello @lambdanis Just to get a proper understanding of the issue.

We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted.

So, to resolve this, following things need to be done:

  • [ ] Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)
  • [ ] Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

prateek041 avatar Apr 22 '24 17:04 prateek041

Hi @prateek041

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)

Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.

Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here.

lambdanis avatar Apr 23 '24 23:04 lambdanis

Sure, thanks for the pointers, I have started working on it. Please assign it to me.

prateek041 avatar Apr 24 '24 10:04 prateek041