node_exporter_aix icon indicating copy to clipboard operation
node_exporter_aix copied to clipboard

CPU use calculations

Open acdmail opened this issue 2 years ago • 4 comments

hi, May you suggest please the Prometheus expression based on metrics provided by node_exporter_aix which would give the actual CPU use of AIX host?

Thanks in advance

acdmail avatar Jul 20 '22 15:07 acdmail

According to this page: https://github.com/thorhs/node_exporter_aix/blob/master/data_sources/cpu.multiple Look like cpu metrics are counters of clock ticks. The number of clock tick per second is around 100 on most systems that I tested, but it varies a little bit, so it is not possible to calculate a rate over them with the assumption that a rate of 100 per second is 100%.

I think you have to calculate a ratio of the sum all the CPU modes (idle, user, sys and wait).

This query seems to give me the correct average CPU usage per host, as a value between 0.0 and 1.0:

avg by (instance) (
    (rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]))
    /
    (rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]) + rate(aix_cpu_idle[5m]))
)

adeverteuil avatar Oct 06 '22 17:10 adeverteuil

what if I use this to calculate CPU utilization data from an AIX lpar ?

100 - avg by (instance ) (rate(aix_cpu_idle{instance="$aixserver"}[1m]))

should this not return the CPU utilization metrics from the lpar ?

Prakash82x avatar Jun 20 '24 13:06 Prakash82x

That works with node_exporter and windows_exporter because the rates of all CPU modes add up to approximately 1 for each core.

With node_exporter_aix, if you do a sum of rates of all cpu modes, it won't add up to 1, so you can't just do '100 - idle%'. To calculate the utilization as a value between 0.0-1.0, you need to sum all the rates on the right-hand side of the equation.

But I guess you could do this:

1-
avg by (instance) (
    rate(aix_cpu_idle[5m])
    /
    (rate(aix_cpu_user[5m]) + rate(aix_cpu_sys[5m]) + rate(aix_cpu_wait[5m]) + rate(aix_cpu_idle[5m]))
)

adeverteuil avatar Jun 23 '24 02:06 adeverteuil

I am not able to get accurate utilization data from the aix node_exporter,

here is how it the metrics values are returned for idle and spurr, Looking at the returned metric, both of them looks different and I am not able to interpret what unit they are in after the epoch.

@{__name__=aix_cpu_idle; cpu=cpu0; instance=aixnode1.domain.com} {1719224818.354, 1521390000} @{__name__=aix_cpu_idle; cpu=cpu1; instance=aixnode1.domain.com} {1719224818.354, 1590790000} @{__name__=aix_cpu_idle; cpu=cpu2; instance=aixnode1.domain.com} {1719224818.354, 1595270000} @{__name__=aix_cpu_idle; cpu=cpu3; instance=aixnode1.domain.com} {1719224818.354, 1206380000} @{__name__=aix_cpu_idle; cpu=cpu4; instance=aixnode1.domain.com} {1719224818.354, 1315650000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu0; instance=aixnode1.domain.com} {1719224818.354, 35610400000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu1; instance=aixnode1.domain.com} {1719224818.354, 758626000000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu2; instance=aixnode1.domain.com} {1719224818.354, 85319900000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu3; instance=aixnode1.domain.com} {1719224818.354, 116369000000000} @{__name__=aix_cpu_pidle_spurr; cpu=cpu4; instance=aixnode1.domain.com} {1719224818.354, 17225100000000}

any help ?

Prakash82x avatar Jun 25 '24 08:06 Prakash82x