tetragon
tetragon copied to clipboard
Add missed probes metrics
wip needs https://github.com/cilium/ebpf/pull/1295
Deploy Preview for tetragon ready!
| Name | Link |
|---|---|
| Latest commit | 02900fe11ddf5e3899d45287ff3d40b1bcd2d615 |
| Latest deploy log | https://app.netlify.com/sites/tetragon/deploys/66b0d848057b810008ebb48b |
| Deploy Preview | https://deploy-preview-1941--tetragon.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Thanks, LGTM.
One question what exactly is this missed metric? :) Is it when a link exists without a program? Or is it something else?
so it's incremented any time we don't execute the bpf program due to recursion in kernel, like when you have 2 kprobe programs trying to run on top of each other on the same cpu
there was a case +- year ago where we were loosing exit events for this reason due to wrong hook we used.. now if that happens again we will see that in the metrics
@lambdanis could you please recheck, I changed few things on metrics side because the tetragon base changed
A few comments:
1. Ideally we should use `MustNewCustomCounter` and `NewCustomCollector` from `pkg/metrics` here. See how other collectors were defined in [a05fbc7](https://github.com/cilium/tetragon/commit/a05fbc71ba49225c045a6354737724810409e4c7) or [7309c9b](https://github.com/cilium/tetragon/commit/7309c9ba22a2b07bae182e5ea8b10aa214be13fd). If it's a trouble for some reason, I'm ok with merging as-is, but then we should have a follow-up item to refactor it.
ok, used that in new version
2. The new metrics are not documented. Using helpers from 1. should provide an easy way to include them in the generated reference.
ok
3. I'm not sure `kprobe_multi (3 functions)` is a useful label value. Seeing such metric, how do I find out which functions were missed?
we don't have any more insight than that.. we get the counter per link/program that is attached to multiple functions, and the missed counter spans over all attached functions.. we discussed some time ago to add a feature that would break down the counter to specific functions, but there was no need so far
ok, used that in new version
I see you use MustNewCustomCounter but the collector is defined as a separate type, not using NewCustomCollector. It would be nice to have the new metrics documented (in docs generated by make metrics-docs), and using NewCustomCollector should make it easy (you need to pass a collectForDocs function reporting fake metrics).