node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

cpufreq collector is useless

Open lheckemann opened this issue 2 years ago • 9 comments

Host operating system: output of uname -a

Linux vivo 5.10.103 #1-NixOS SMP Wed Mar 2 10:42:57 UTC 2022 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.2.2 (branch: unknown, revision: v1.2.2)
  build user:       nix@nixpkgs
  build date:       unknown
  go version:       go1.16.13
  platform:         linux/amd64

node_exporter command line flags

  --collector.textfile --collector.textfile.directory /run/prometheus-node-exporter --collector.systemd --collector.systemd.unit-exclude='.+\\.(automount|device|scope|slice)' --collector.diskstats.ignored-devices='^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p|mmcblk\\d+p)\\d+$|^zd' \
  --no-collector.ipvs --no-collector.schedstat --no-collector.mdadm --no-collector.nfsd --no-collector.bonding --no-collector.infiniband --no-collector.nfs --no-collector.rapl --no-collector.fibrechannel --no-collector.tapestats --no-collector.nvme \
  --web.listen-address 0.0.0.0:9100

Are you running node_exporter in Docker?

no

What did you do that produced an error?

Looked at the node_cpu_scaling_frequency_hertz metric over time

What did you expect to see?

Representative statistics about CPU frequency

What did you see instead?

Plateaus of high CPU frequency whenever the system is mostly idle.

I suspect that this is because the "boost" frequencies are thermally viable and thus used for the time in which the node exporter is doing its business only when the system is otherwise idle. In other words, the sampling is ineffective because of its own side effects.

Requiring CONFIG_CPU_FREQ_STAT and exporting counters for time_in_state and potentially also trans_table would probably make a lot more sense to see what frequencies the CPU is really running at most of the time. See also the docs for cpufreq-stats: https://www.kernel.org/doc/html/latest/cpu-freq/cpufreq-stats.html

lheckemann avatar Mar 10 '22 11:03 lheckemann

This does not seem like an issue with the exporter. The exporter exposes what data the kernel provides.

SuperQ avatar Mar 10 '22 13:03 SuperQ

It doesn't seem like this feature works on modern systems:

$ grep CONFIG_CPU_FREQ_STAT /boot/config-$(uname -r)
CONFIG_CPU_FREQ_STAT=y
$ ls -l /sys/devices/system/cpu/*/cpufreq/stats
ls: cannot access '/sys/devices/system/cpu/*/cpufreq/stats': No such file or directory
$ grep 'model name' /proc/cpuinfo | sort -u
model name	: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

SuperQ avatar Mar 10 '22 15:03 SuperQ

I think the exporter exporting a useless number, in the default configuration, and without caveats in the documentation, is an issue with the exporter --- regardless of where it comes from.

As for the stats, that's unfortunate. Surely there must be a mechanism to access them? I'll investigate further in the near future.

lheckemann avatar Mar 15 '22 15:03 lheckemann

@lheckemann Do you get any errors in the log?

We should log an error if we can't access /sys/devices/system/cpu/*/cpufreq/stats

discordianfish avatar Mar 22 '22 10:03 discordianfish

Looks like we ignore missing files here: https://github.com/prometheus/procfs/blob/c6f5590c757b5cd2e9eb1c902b7f8e110a2fde0c/sysfs/system_cpu.go#L202

discordianfish avatar Mar 22 '22 11:03 discordianfish

IIRC the ignoring of missing files was intentional, as the cpufreq drivers have inconsistent implementations.

SuperQ avatar Mar 22 '22 11:03 SuperQ

I don't get why these files missing would lead to Plateaus of high CPU frequency whenever the system is mostly idle. though..

discordianfish avatar Mar 23 '22 11:03 discordianfish

@lheckemann What scaling governor (ie cat /sys/devices/system/cpu/*/cpufreq/scaling_governor) are you using, because that makes a huge difference what will be observed. Please read the docs and provide sufficient information on the system before reporting something as useless.

hodgesds avatar Mar 31 '22 00:03 hodgesds

with recent kernel 6.5.2 there's no /sys/devices/system/cpu/*/cpufreq here (except I would set amd_pstate=active for the grub kernel command line). That's why I do use my own 3-liner to have a metrics [1] :

  var="tinderbox_cpu_frequency"
  echo -e "# HELP Current scaled cpu thread frequency in hertz.\n# TYPE $var gauge"
  grep "MHz" /proc/cpuinfo | awk '{ print NR-1, $4 }' | sed -e "s,^,$var{cpu=\"," -e 's, ,"} ,'

[1] https://github.com/toralf/tinderbox/blob/main/bin/metrics.sh#L50

toralf avatar Sep 07 '23 08:09 toralf