node_exporter
node_exporter copied to clipboard
Collect linux bpf stats
Since linux 5.1 the kernel can collect some bpf stats: https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-prog.rst?plain=1#L80
It seems possible to get the stats from anon_inode, or bpftool indirectly, I didnt test yet. However, I'm not sure how much its stable, or what permissions are needed.
I would like to know wdyt about monitoring system-wide bpf by node-exporter. Need to check the issues above if you think it fits.
Makes sense but as usual, needs to go into https://github.com/prometheus/procfs first
-
it requires
CAP_SYS_ADMIN
- see https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h#L406 on linux master, which only improves upon previous state withCAP_BPF
(CAP_BPF patch on LWN). -
As mentioned, stats collection is possible since 5.1. We should use
bpf_prog_get_next_id
,bpf_prog_get_fd_by_id
,bpf_prog_get_info_by_fd
(since 4.13) syscalls to implement the collector, like Netflix/bpftop do with libbpf/libbpf-rs. -
We can use cilium/ebpf that supports the above method, though I see procfs has very minimal deps.
@discordianfish wdyt?
Unfortunately we don't allow collectors that require CAP_SYS_ADMIN
. We have a policy against requiring users to need root access.
@SuperQ some reasonable use cases require CAP_SYS_ADMIN
, maybe we should support this somehow?
Maybe some 'admin mode' where the node-exporter:
- is suppose to run with
CAP_SYS_ADMIN
and errors out if not - only provides metrics from collectors that require
CAP_SYS_ADMIN
Then you could run two node-exporters, one unprivileged and one with CAP_SYS_ADMIN
. Dunno.. but writing a textfile script for each of these seems meh..
A client cant avoid CAP_SYS_ADMIN
, and node_exporter can help. I would prefer not to force have 2 daemons instead of 1, with some admin mode as suggested and indicating telemetry.
Isn't this already handled by https://github.com/cloudflare/ebpf_exporter ?
- https://github.com/cloudflare/ebpf_exporter does export the metrics (it can do much more, though I think the extended capabilities less fit the ebpf model with userspace controller).
- I don't know why they don't require
CAP_SYS_ADMIN
for those metrics in docs. - I would like to have only node exporter.
- I think its actually tricky for node_exporter, as it may treat bpf progs like processes and let them be out of scope, and possible aggregation should take place at query time. However, they are part of the loaded kernel, so it may monitor them.
The Prometheus project does not have a "single node agent" data model. Having everything in the node_exporter is not something we ever plan to support. There are different exporters for different things.
So, again, this is not a feature we can support at this time due to the privileges necessary to implement it. As well as the fact that Go does not have any kind of privilege dropping support.