kepler_node_core_joules_total=0 on RHEL9/x86_64
What happened?
Downloaded and installed
https://github.com/sustainable-computing-io/kepler/releases/download/v0.7.9/kepler.rpm.tar.gz
On server running
- 5.14.0-417.kpq1.el9.x86_64
- Red Hat Enterprise Linux 9.4 (Plow)
Ran several CPU intensive workloads and metric remained '0'
What did you expect to happen?
expected the metric reading to increase/track system cpu usage
How can we reproduce it (as minimally and precisely as possible)?
Download & install rpm start service root# systemctl start container-kepler --now root# curl localhost:8888/metrics | grep
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
# On Linux:
$ cat /etc/os-release
Red Hat Enterprise Linux 9.4 (Plow)
$ uname -a
Linux perf-intel-28.perf.eng.bos2.dc.redhat.com 5.14.0-417.kpq1.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 2 14:05:04 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
</details>
### Install tools
<details>
# rpm --version
RPM version 4.16.1.3
</details>
### Kepler deployment config
<details>
For standalone:
# put your Kepler command argument here
root# systemctl start container-kepler --now
root# curl localhost:8888/metrics | grep
</details>
### Container runtime (CRI) and version (if applicable)
<details>
</details>
### Related plugins (CNI, CSI, ...) and versions (if applicable)
<details>
</details>
@jharriga can you double check if it is kepler_node_core_joules_total or kepler_node_package_joules_total?
Current Ampere xgene hwmon only reports the CPU and I/O power (per doc here). We cannot get DRAM power. So to align with the RAPL reporting, kepler only reports kepler_node_core_total (per code here)
This was originally reported on x86. Running with v0.7.10 Running w/v0.7.10 on x86 I do see the metric kepler-node-core-joules-total does have value root# curl localhost:8888/metrics | grep kepler_node_core_joules_total
-
kepler_node_core_joules_total{instance="nuc7",mode="dynamic",package="0",source="intel_rapl"} 39.07
-
kepler_node_core_joules_total{instance="nuc7",mode="idle",package="0",source="intel_rapl"} 61360.029
As for ARM, on Ampere server running v0.7.10 I see:
-
kepler_node_core_joules_total{instance="perf-arm-11.perf.eng.bos2.dc.redhat.com",mode="dynamic",package="0",source="intel_rapl"} 98036.551
-
kepler_node_package_joules_total{instance="perf-arm-11.perf.eng.bos2.dc.redhat.com",mode="dynamic",package="0",source="intel_rapl"} 98057.08
Both the kepler_node_core_joules_total and kepler_node_package_joules_total metrics do have a values. This doesn't seem to align with what you expected in the previous comment.
At any rate I think this Issue can be CLOSED since the originally reported problem on x86 appears to have been resolved.