node_exporter
node_exporter copied to clipboard
Provide better explanation for "load average / CPU saturation" metric
I would like the "load average / CPU saturation" metric to be better explained. https://github.com/prometheus/node_exporter/blob/306a3653779c13244b2dc72f5bdd6098a56f83ef/docs/node-mixin/rules/rules.libsonnet#L28-L31
I find it quite useful, but as mentioned in the comment, it is not clear what it measures/means. I was not able to detect the issues with my cluster, until I noticed the "load average" metric in Grafana, and after some digging I found the problems.
I do not mind writing the PR myself, just tell me where that knowledge and explanation would fit. Also, I am not sure if this is the correct repository, so please correct me, if this is not the place where the metrics are being created.
@VladMasarik Yes, this is the correct repository. instance:node_load1_per_cpu:ratio is defined in a recording rule with a comment similar to the one you linked above. The recording rule is probably the correct place for it. It would be great if you submitted a PR.
Note that the meaning of load average is usually documented in the respective kernel. For Linux see the docs on /proc/loadavg and friends at https://www.kernel.org/doc/html/latest/filesystems/proc.html and also https://linux.die.net/man/5/proc
Closing as this not moving anywhere. I agree that it is explained well in other cases, but I think it could be better, and no need to guess which definition of loadavg is used.