node_exporter_aix
node_exporter_aix copied to clipboard
CPU use calculations
hi, May you suggest please the Prometheus expression based on metrics provided by node_exporter_aix which would give the actual CPU use of AIX host?
Thanks in advance
According to this page: https://github.com/thorhs/node_exporter_aix/blob/master/data_sources/cpu.multiple Look like cpu metrics are counters of clock ticks. The number of clock tick per second is around 100 on most systems that I tested, but it varies a little bit, so it is not possible to calculate a rate over them with the assumption that a rate of 100 per second is 100%.
I think you have to calculate a ratio of the sum all the CPU modes (idle, user, sys and wait).
This query seems to give me the correct average CPU usage per host, as a value between 0.0 and 1.0:
avg by (instance) (
(rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]))
/
(rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]) + rate(aix_cpu_idle[5m]))
)
what if I use this to calculate CPU utilization data from an AIX lpar ?
100 - avg by (instance ) (rate(aix_cpu_idle{instance="$aixserver"}[1m]))
should this not return the CPU utilization metrics from the lpar ?
That works with node_exporter
and windows_exporter
because the rates of all CPU modes add up to approximately 1 for each core.
With node_exporter_aix, if you do a sum of rates of all cpu modes, it won't add up to 1, so you can't just do '100 - idle%'. To calculate the utilization as a value between 0.0-1.0, you need to sum all the rates on the right-hand side of the equation.
But I guess you could do this:
1-
avg by (instance) (
rate(aix_cpu_idle[5m])
/
(rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]) + rate(aix_cpu_idle[5m]))
)
I am not able to get accurate utilization data from the aix node_exporter,
here is how it the metrics values are returned for idle and spurr, Looking at the returned metric, both of them looks different and I am not able to interpret what unit they are in after the epoch.
@{__name__=aix_cpu_idle; cpu=cpu0; instance=aixnode1.domain.com} {1719224818.354, 1521390000} @{__name__=aix_cpu_idle; cpu=cpu1; instance=aixnode1.domain.com} {1719224818.354, 1590790000} @{__name__=aix_cpu_idle; cpu=cpu2; instance=aixnode1.domain.com} {1719224818.354, 1595270000} @{__name__=aix_cpu_idle; cpu=cpu3; instance=aixnode1.domain.com} {1719224818.354, 1206380000} @{__name__=aix_cpu_idle; cpu=cpu4; instance=aixnode1.domain.com} {1719224818.354, 1315650000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu0; instance=aixnode1.domain.com} {1719224818.354, 35610400000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu1; instance=aixnode1.domain.com} {1719224818.354, 758626000000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu2; instance=aixnode1.domain.com} {1719224818.354, 85319900000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu3; instance=aixnode1.domain.com} {1719224818.354, 116369000000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu4; instance=aixnode1.domain.com} {1719224818.354, 17225100000000}
any help ?