Bug in OpenBSD CPU stats - Metrics appear to be only ~1/10th of the actual values
Host operating system: output of uname -a
OpenBSD foo 7.3 GENERIC.MP#4 i386
node_exporter version: output of node_exporter --version
foo# node_exporter --version
node_exporter, version 1.5.0 (branch: non-git, revision: non-git)
build user: openbsd_ports
build date: 2023-03-24
go version: go1.20.1
platform: openbsd/386
node_exporter command line flags
--web.listen-address=10.0.2.15:9100 --collector.textfile.directory=/tmp/textfile_metrics/
node_exporter log output
n/a
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
Run node_exporter as a daemon on OpenBSD
What did you expect to see?
Correct CPU stats in Prometheus
What did you see instead?
Incorrect (from my understanding) CPU stats :)
We found that the expression sum by (cpu) (rate(node_cpu_seconds_total(instance="foo")[1m])) returns values arpund 0.1 instead of 1 (I am aware that this does not necessarily sums to 1 exactly, but something pretty close usually).
So it looks like we are off by a factor of around 10 :thinking:
Looking into cpu_openbsd.go, I found that the metrics are calculated by using sysctl kern.cp_time / sysctl kern.cp_time2 to get the number of ticks spent in each mode (at least that is what I understood from OpenBSD's sysctl manpage HERE), and then dividing that number by the clock rate (the number of ticks per second), which to me seems correct (although I am not sure about the difference between the "hard clock" and the "statistics clock" mentioned HERE, they are not different enough to explain the observed factor of 10).
So, given a clockrate of 100 hz (100 ticks per second), I would assume that the metrics are each just 1/100th of the values returned by sysctl kern.cp_time.
BUT when directly comparing the values returned from sysctl kern.cp_time with those returned by the exporter, we see they are more like 1/1000th (sysctl kern.cp_time returns the values in the order: interrupt, nice, user, system, spin, idle, see HERE):
foo# sysctl kern.clockrate
kern.clockrate=tick = 10000, hz = 100, profhz = 1024, stathz = 128
foo#
foo# sysctl kern.cp_time && curl -s http://10.0.2.15:9100/metrics | grep -i cpu_seconds
kern.cp_time=2391,0,1987,60,117,976656
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 953.765625
node_cpu_seconds_total{cpu="0",mode="interrupt"} 0.1142578125
node_cpu_seconds_total{cpu="0",mode="nice"} 0
node_cpu_seconds_total{cpu="0",mode="spin"} 0.05859375
node_cpu_seconds_total{cpu="0",mode="system"} 1.94140625
node_cpu_seconds_total{cpu="0",mode="user"} 2.3349609375
Dividing the metric values by the values returned from sysctl kern.cp_time gives us 1024 :thinking:
So to me it appears that somehow we get the wrong value as the clockrate, but I have not been able to figure out where / how exactly that happens - maybe the return values from unix.SysctlRaw("kern.clockrate") get mapped to the wrong fields of the clockinfo struct?
I hope I included enough information for troubleshooting by someone more knowledgeable in golang, please let me know if I can provide any further useful info or assist in any way.
I just tested this against a "normal" install of OpenBSD (the i386 .iso from the official sources) instead of the self-compiled image where this behaviour was observed - and the bug was not present there!
So we'll investigate the build steps of our custom image.
HI I see that you are using node_exporter on OpenBSD. Do you have some experience with it in configuring TLS encryption and how to do it?
@manja-80 no, unfortunately not. Maybe THIS helps?
Thanks for reply. I saw it, but still figuring out how to configure it properly :)
@paketb0te Did you find something out? Otherwise lets close this for now
@discordianfish haven't gotten around to dive deeper into the issue -> closing
I was on vacation, so didn't have time to check or answer on it :)