tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

Add missed probes metrics

Open olsajiri opened this issue 1 year ago • 1 comments

wip needs https://github.com/cilium/ebpf/pull/1295

olsajiri avatar Jan 08 '24 09:01 olsajiri

Deploy Preview for tetragon ready!

Name Link
Latest commit 02900fe11ddf5e3899d45287ff3d40b1bcd2d615
Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66b0d848057b810008ebb48b
Deploy Preview https://deploy-preview-1941--tetragon.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Mar 28 '24 13:03 netlify[bot]

Thanks, LGTM.

One question what exactly is this missed metric? :) Is it when a link exists without a program? Or is it something else?

so it's incremented any time we don't execute the bpf program due to recursion in kernel, like when you have 2 kprobe programs trying to run on top of each other on the same cpu

there was a case +- year ago where we were loosing exit events for this reason due to wrong hook we used.. now if that happens again we will see that in the metrics

olsajiri avatar Jul 31 '24 12:07 olsajiri

@lambdanis could you please recheck, I changed few things on metrics side because the tetragon base changed

olsajiri avatar Jul 31 '24 18:07 olsajiri

A few comments:

1. Ideally we should use `MustNewCustomCounter` and `NewCustomCollector` from `pkg/metrics` here. See how other collectors were defined in [a05fbc7](https://github.com/cilium/tetragon/commit/a05fbc71ba49225c045a6354737724810409e4c7) or [7309c9b](https://github.com/cilium/tetragon/commit/7309c9ba22a2b07bae182e5ea8b10aa214be13fd). If it's a trouble for some reason, I'm ok with merging as-is, but then we should have a follow-up item to refactor it.

ok, used that in new version

2. The new metrics are not documented. Using helpers from 1. should provide an easy way to include them in the generated reference.

ok

3. I'm not sure `kprobe_multi (3 functions)` is a useful label value. Seeing such metric, how do I find out which functions were missed?

we don't have any more insight than that.. we get the counter per link/program that is attached to multiple functions, and the missed counter spans over all attached functions.. we discussed some time ago to add a feature that would break down the counter to specific functions, but there was no need so far

olsajiri avatar Aug 02 '24 08:08 olsajiri

ok, used that in new version

I see you use MustNewCustomCounter but the collector is defined as a separate type, not using NewCustomCollector. It would be nice to have the new metrics documented (in docs generated by make metrics-docs), and using NewCustomCollector should make it easy (you need to pass a collectForDocs function reporting fake metrics).

lambdanis avatar Aug 02 '24 15:08 lambdanis