kepler icon indicating copy to clipboard operation
kepler copied to clipboard

kepler_node_core_joules_total=0 on RHEL9/x86_64

Open jharriga opened this issue 1 year ago • 2 comments

What happened?

Downloaded and installed

https://github.com/sustainable-computing-io/kepler/releases/download/v0.7.9/kepler.rpm.tar.gz

On server running

  • 5.14.0-417.kpq1.el9.x86_64
  • Red Hat Enterprise Linux 9.4 (Plow)

Ran several CPU intensive workloads and metric remained '0'

What did you expect to happen?

expected the metric reading to increase/track system cpu usage

How can we reproduce it (as minimally and precisely as possible)?

Download & install rpm start service root# systemctl start container-kepler --now root# curl localhost:8888/metrics | grep

Anything else we need to know?

No response

Kepler image tag

v0.7.9

Kubernetes version

NONE

Cloud provider or bare metal

bare metal

OS version

# On Linux:
$ cat /etc/os-release
Red Hat Enterprise Linux 9.4 (Plow)

$ uname -a
Linux perf-intel-28.perf.eng.bos2.dc.redhat.com 5.14.0-417.kpq1.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 2 14:05:04 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
</details>


### Install tools

<details>
# rpm --version
RPM version 4.16.1.3
</details>


### Kepler deployment config

<details>
For standalone:
# put your Kepler command argument here
root# systemctl start container-kepler --now
root# curl localhost:8888/metrics | grep
</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>

jharriga avatar Apr 11 '24 17:04 jharriga

@jharriga can you double check if it is kepler_node_core_joules_total or kepler_node_package_joules_total?

Current Ampere xgene hwmon only reports the CPU and I/O power (per doc here). We cannot get DRAM power. So to align with the RAPL reporting, kepler only reports kepler_node_core_total (per code here)

rootfs avatar May 22 '24 20:05 rootfs

This was originally reported on x86. Running with v0.7.10 Running w/v0.7.10 on x86 I do see the metric kepler-node-core-joules-total does have value root# curl localhost:8888/metrics | grep kepler_node_core_joules_total

  • kepler_node_core_joules_total{instance="nuc7",mode="dynamic",package="0",source="intel_rapl"} 39.07

  • kepler_node_core_joules_total{instance="nuc7",mode="idle",package="0",source="intel_rapl"} 61360.029

As for ARM, on Ampere server running v0.7.10 I see:

  • kepler_node_core_joules_total{instance="perf-arm-11.perf.eng.bos2.dc.redhat.com",mode="dynamic",package="0",source="intel_rapl"} 98036.551

  • kepler_node_package_joules_total{instance="perf-arm-11.perf.eng.bos2.dc.redhat.com",mode="dynamic",package="0",source="intel_rapl"} 98057.08

Both the kepler_node_core_joules_total and kepler_node_package_joules_total metrics do have a values. This doesn't seem to align with what you expected in the previous comment.

At any rate I think this Issue can be CLOSED since the originally reported problem on x86 appears to have been resolved.

jharriga avatar Jun 03 '24 18:06 jharriga