node_exporter Provide better explanation for "load average / CPU saturation" metric

Provide better explanation for "load average / CPU saturation" metric

Open VladMasarik opened this issue 4 years ago • 2 comments

I would like the "load average / CPU saturation" metric to be better explained. https://github.com/prometheus/node_exporter/blob/306a3653779c13244b2dc72f5bdd6098a56f83ef/docs/node-mixin/rules/rules.libsonnet#L28-L31

I find it quite useful, but as mentioned in the comment, it is not clear what it measures/means. I was not able to detect the issues with my cluster, until I noticed the "load average" metric in Grafana, and after some digging I found the problems.

I do not mind writing the PR myself, just tell me where that knowledge and explanation would fit. Also, I am not sure if this is the correct repository, so please correct me, if this is not the place where the metrics are being created.

Jan 12 '21 07:01 VladMasarik

@VladMasarik Yes, this is the correct repository. instance:node_load1_per_cpu:ratio is defined in a recording rule with a comment similar to the one you linked above. The recording rule is probably the correct place for it. It would be great if you submitted a PR.

Jan 22 '21 21:01 hooten

Note that the meaning of load average is usually documented in the respective kernel. For Linux see the docs on /proc/loadavg and friends at https://www.kernel.org/doc/html/latest/filesystems/proc.html and also https://linux.die.net/man/5/proc

Jan 30 '21 21:01 nemobis

Closing as this not moving anywhere. I agree that it is explained well in other cases, but I think it could be better, and no need to guess which definition of loadavg is used.

Feb 17 '23 13:02 VladMasarik

node_exporter node_exporter copied to clipboard

Provide better explanation for "load average / CPU saturation" metric

node_exporter
node_exporter copied to clipboard