kepler Cannot get pod energy information

Hi, I'm trying to use Kepler in my k8s cluster. It was deployed on one node (node1) together with Prometheus and Grafana. There are many pods running on this node. I was expecting all the pod energy can be displayed in Grafana dashboard, however I can only see one record of "pod_energy_stat" in Prometheus, which is about "pod_name="system_processes", pod_namespace="system"", and this pod/namespace doesn't even exist in my cluster. Do you have any clue on what the issue is about?

Aug 15 '22 05:08 ruomengh

thank you for reporting this issue. In your issue, it looks Kepler failed to resolve the pod and attributed all energy usage to the system process (i.e. using system_process as pod name and system as pod namespace). If Kepler cannot find the pods, the root cause are likely in the container runtime and kubelet port in your cluster.

Can you check the following and share your cluster info?

do you use cgroup v1 or v2? v2 is the assumed default. If you use v1, please turn this option on
what is the container runtime? We've tested cri-o. If your container runtime is different, there might be a different sysfs path for the pod, and that makes Kepler unable to find the pod name
which port does kublet uses? Kepler assumes default port 10250. If your kubelet runs on a different port, Kepler may not be able to find out. The new port has to be set in the env var
do you have the kepler log available, i.e. the output of kubectl logs -n monitoring daemonset/kepler-exporter?

cc @marceloamaral

Aug 15 '22 11:08 rootfs

Thanks for prompt response. Please see my answers inline.

do you use cgroup v1 or v2? v2 is the assumed default.

I think my kernel supports both v1 and v2 $ mount | grep '^cgroup' | awk '{print $1}' | uniq cgroup2 cgroup

what is the container runtime?

It's containerd://1.5.9-0ubuntu3

which port does kubelet uses?

It should be the default port.

do you have the kepler log available?

The log is like the following, all about "system_processes" in namespace "system" 2022/08/16 03:20:48 energy count: core 126086.00 dram: 6008.00 time 0.000000 cycles 28257170458 instructions 30679042784 misses 33231048 node memory 7492431872.000 2022/08/16 03:20:48 energy from pod: name: system_processes namespace: system eCore: 183834(48858366737) eDram: 132(9223372036862737727) eOther: 0(0) eGPU: 0(0) CPUTime: 0.00 (NaN) cycles: 28257170458 (1.0000) instructions: 30679042784 (1.0000) DiskReadBytes: 0 (0) DiskWriteBytes: 0 (0) misses: 33231048 (1.0000) ResidentMemRatio: 0.1929 avgCPUFreq: 1707.8185 MHZ pid: 1002212 comm: containerd-shim cgroupfs: map[]

Aug 16 '22 04:08 ruomengh

Thanks, I'll check a setup using containerd

Aug 16 '22 11:08 rootfs

turns out containerd on my setup has a different path pattern:

# kubectl describe pod nginx-7cd588b686-mkpzs |grep "Container ID"
    Container ID:   containerd://286b15051ec43375190802e1f40562536980a8fd97e75bb89c7f2eec6f995f17
# find  /sys/fs/cgroup/systemd/ -iname "*286b15051ec43375190802e1f40562536980a8fd97e75bb89c7f2eec6f995f17"
/sys/fs/cgroup/systemd/system.slice/containerd.service/kubepods-burstable-poda3b200c9_db51_40b4_9d2d_53f8fdf80d7f.slice:cri-containerd:286b15051ec43375190802e1f40562536980a8fd97e75bb89c7f2eec6f995f17

while the regex used to parse container path doesn't capture this pattern

cc @marceloamaral

Aug 16 '22 17:08 rootfs

this was tested on ubuntu 20.04 with containerd version

# containerd -v
containerd github.com/containerd/containerd 1.5.9-0ubuntu1~20.04.4

Aug 16 '22 17:08 rootfs

@ruomengh would you like to provide a fix for this? We can go through the development process to help you get started.

Aug 16 '22 17:08 rootfs

@ruomengh would you like to provide a fix for this? We can go through the development process to help you get started.

Sounds good. I'd like to have a try but I may need some guidance.

Aug 17 '22 01:08 ruomengh

I've email the development process, please let me know if you have any issues there, looking forward to your contribution!

Aug 17 '22 11:08 rootfs

@ruomengh can you try the latest kepler container image? Deleting the kepler deployment and recreating it will do.

I just hit the same issue on RHEL 8 with cri-o, but it seems #108 fixed it

cc @sunya-ch

Aug 17 '22 16:08 rootfs

Re-deploy kepler with the latest image and the issue remains.

Aug 18 '22 01:08 ruomengh

@ruomengh What is the detected kernel version? Could you post head part of the kepler?

> kubectl logs -n $(kubectl get po -A|grep kepler-exporter|awk '{print $1,$2}')|head
2022/08/18 01:50:57 InitSliceHandler: &{map[] /sys/fs/cgroup/system.slice /sys/fs/cgroup/system.slice /sys/fs/cgroup/system.slice}
use sysfs to obtain power
config EnabledEBPFCgroupID enabled:  true
config getKernelVersion:  4.18
config set EnabledEBPFCgroupID to  true

Aug 18 '22 01:08 sunya-ch

2022/08/18 01:33:03 InitSliceHandler: &{map[] /sys/fs/cgroup/cpu /sys/fs/cgroup/memory /sys/fs/cgroup/blkio} use sysfs to obtain power config EnabledEBPFCgroupID enabled: true config getKernelVersion: 5.15 config set EnabledEBPFCgroupID to false

Aug 18 '22 05:08 ruomengh

2022/08/18 01:33:03 InitSliceHandler: &{map[] /sys/fs/cgroup/cpu /sys/fs/cgroup/memory /sys/fs/cgroup/blkio} use sysfs to obtain power config EnabledEBPFCgroupID enabled: true config getKernelVersion: 5.15 config set EnabledEBPFCgroupID to false

According to this log, the only condition that causes EnabledEBPFCgroupID disabled is cgroup. The current implementation check the cgroup from this static path exists /sys/fs/cgroup/cgroup.controllers. Could you confirm that this path exists on your host?

If exists on host, double check that Kepler deployment manifest is mounted to /sys path (this should be defined in the provided manifest). If not, I think we need to fix the method to detect cgroup version to cover this.

related source code: https://github.com/sustainable-computing-io/kepler/blob/ef763b68e9a11e956936de06b8a4e8af94458f58/pkg/config/config.go#L69

Aug 18 '22 06:08 sunya-ch

The path doesn't exist - ls: cannot access '/sys/fs/cgroup/cgroup.controllers': No such file or directory

Aug 18 '22 06:08 ruomengh

I see. It seems like your system is not installed cgroupv2 by this approach. So that the file isn't created there as expected from the issue https://github.com/sustainable-computing-io/kepler/issues/29.

I think we should change the way to detect cgroupv2 by mount point on host as you did or check /proc/filesystems file.

Aug 18 '22 06:08 sunya-ch

Seems cgroup v2 is disabled on the node to avoid another issue. I tried this for cgroup v1 as @rootfs suggested but it doesn't work either https://github.com/sustainable-computing-io/kepler/blob/main/manifests/kubernetes/deployment.yaml#L71

Aug 18 '22 09:08 ruomengh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

May 17 '23 19:05 stale[bot]

Since we support cgroup v1 now, @ruomengh do we still have this issue?

May 18 '23 05:05 marceloamaral

@jiere I assume the issue has been gone in our environment. Please help to confirm. Thanks.

May 18 '23 06:05 ruomengh

Using quay.io/sustainable_computing_io/kepler:release-0.5 on Fedora Core 33 with cGroup v1 (systemd.unified_cgroup_hierarchy=0 in grub) does not work as expected:

See bellow an extract of the logs (verbose=5) where every container has the same counters:

I0623 15:45:09.166734       1 metric_collector.go:137] energy from pod/container (0 active processes): name: csi-cinder-nodeplugin-4jx6c/cinder-csi-plugin namespace: kube-system 
    cgrouppid: 0 pid: [] comm: 
    Dyn ePkg (mJ): 9512 (4461128) (eCore: 9512 (4461128) eDram: 150 (70350) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    Idle ePkg (mJ): 0 (0) (eCore: 0 (0) eDram: 0 (0) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    CPUTime:  0 (0)
    NetTX IRQ: 0 (0)
    NetRX IRQ: 0 (0)
    Block IRQ: 0 (0)
    counters: map[cache_miss:0 (0) cpu_cycles:0 (0) cpu_instr:0 (0) cpu_ref_cycles:0 (0)]
    cgroupfs: map[block_devices_used:0 (0) cgroupfs_cpu_usage_us:0 (0) cgroupfs_ioread_bytes:0 (0) cgroupfs_iowrite_bytes:0 (0) cgroupfs_kernel_memory_usage_bytes:0 (0) cgroupfs_memory_usage_bytes:0 (0) cgroupfs_system_cpu_usage_us:0 (0) cgroupfs_tcp_memory_usage_bytes:0 (0) cgroupfs_user_cpu_usage_us:0 (0)]
    kubelets: map[container_cpu_usage_seconds_total:0 (1503) container_memory_working_set_bytes:0 (13774848)]

I0623 15:45:09.166748       1 metric_collector.go:137] energy from pod/container (0 active processes): name: csi-cinder-nodeplugin-4jx6c/node-driver-registrar namespace: kube-system 
    cgrouppid: 0 pid: [] comm: 
    Dyn ePkg (mJ): 9512 (4461128) (eCore: 9512 (4461128) eDram: 150 (70350) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    Idle ePkg (mJ): 0 (0) (eCore: 0 (0) eDram: 0 (0) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    CPUTime:  0 (0)
    NetTX IRQ: 0 (0)
    NetRX IRQ: 0 (0)
    Block IRQ: 0 (0)
    counters: map[cache_miss:0 (0) cpu_cycles:0 (0) cpu_instr:0 (0) cpu_ref_cycles:0 (0)]
    cgroupfs: map[block_devices_used:0 (0) cgroupfs_cpu_usage_us:0 (0) cgroupfs_ioread_bytes:0 (0) cgroupfs_iowrite_bytes:0 (0) cgroupfs_kernel_memory_usage_bytes:0 (0) cgroupfs_memory_usage_bytes:0 (0) cgroupfs_system_cpu_usage_us:0 (0) cgroupfs_tcp_memory_usage_bytes:0 (0) cgroupfs_user_cpu_usage_us:0 (0)]
    kubelets: map[container_cpu_usage_seconds_total:0 (13) container_memory_working_set_bytes:0 (3063808)]

Jun 23 '23 19:06 fadam-csgroup

Using quay.io/sustainable_computing_io/kepler:release-0.5 on Fedora Core 33 with cGroup v1 (systemd.unified_cgroup_hierarchy=0 in grub) does not work as expected:

See bellow an extract of the logs (verbose=5) where every container has the same counters:

I0623 15:45:09.166734       1 metric_collector.go:137] energy from pod/container (0 active processes): name: csi-cinder-nodeplugin-4jx6c/cinder-csi-plugin namespace: kube-system 
    cgrouppid: 0 pid: [] comm: 
    Dyn ePkg (mJ): 9512 (4461128) (eCore: 9512 (4461128) eDram: 150 (70350) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    Idle ePkg (mJ): 0 (0) (eCore: 0 (0) eDram: 0 (0) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    CPUTime:  0 (0)
    NetTX IRQ: 0 (0)
    NetRX IRQ: 0 (0)
    Block IRQ: 0 (0)
    counters: map[cache_miss:0 (0) cpu_cycles:0 (0) cpu_instr:0 (0) cpu_ref_cycles:0 (0)]
    cgroupfs: map[block_devices_used:0 (0) cgroupfs_cpu_usage_us:0 (0) cgroupfs_ioread_bytes:0 (0) cgroupfs_iowrite_bytes:0 (0) cgroupfs_kernel_memory_usage_bytes:0 (0) cgroupfs_memory_usage_bytes:0 (0) cgroupfs_system_cpu_usage_us:0 (0) cgroupfs_tcp_memory_usage_bytes:0 (0) cgroupfs_user_cpu_usage_us:0 (0)]
    kubelets: map[container_cpu_usage_seconds_total:0 (1503) container_memory_working_set_bytes:0 (13774848)]

I0623 15:45:09.166748       1 metric_collector.go:137] energy from pod/container (0 active processes): name: csi-cinder-nodeplugin-4jx6c/node-driver-registrar namespace: kube-system 
    cgrouppid: 0 pid: [] comm: 
    Dyn ePkg (mJ): 9512 (4461128) (eCore: 9512 (4461128) eDram: 150 (70350) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    Idle ePkg (mJ): 0 (0) (eCore: 0 (0) eDram: 0 (0) eUncore: 0 (0)) eGPU (mJ): 0 (0) eOther (mJ): 0 (0) 
    CPUTime:  0 (0)
    NetTX IRQ: 0 (0)
    NetRX IRQ: 0 (0)
    Block IRQ: 0 (0)
    counters: map[cache_miss:0 (0) cpu_cycles:0 (0) cpu_instr:0 (0) cpu_ref_cycles:0 (0)]
    cgroupfs: map[block_devices_used:0 (0) cgroupfs_cpu_usage_us:0 (0) cgroupfs_ioread_bytes:0 (0) cgroupfs_iowrite_bytes:0 (0) cgroupfs_kernel_memory_usage_bytes:0 (0) cgroupfs_memory_usage_bytes:0 (0) cgroupfs_system_cpu_usage_us:0 (0) cgroupfs_tcp_memory_usage_bytes:0 (0) cgroupfs_user_cpu_usage_us:0 (0)]
    kubelets: map[container_cpu_usage_seconds_total:0 (13) container_memory_working_set_bytes:0 (3063808)]

@marceloamaral

Jun 23 '23 23:06 rootfs

@rootfs the problem is the BPF code.

With the apiserver we can now identify the containers, but the apiserver does not give the PIDs into the containers. You can see in the logs that all pid list are empty. The cgroup metrics can only be extracted if we have the pid of a process in the cgroup....

The PID information comes from the eBPF code, which is probably not working.

How is this environment? Is it bare-metal or VM? How k8s was deployed? Mini-clusters like Kind and minikube does not expose the host /proc folder by default. So, when we create a kind cluster we mount the host folder so that Kepler can access the pid informations.

Jun 28 '23 03:06 marceloamaral

This Kubernetes environment has been deployed using Openstack Magnum on VMs.

Jun 29 '23 07:06 fadam-csgroup

@fadam-csgroup

Can you check if you have visibility of the PIDs? The command oc exec -it -n kepler service/kepler-exporter -- ls /proc should show a lot of PIDs.

Jun 29 '23 12:06 marceloamaral

@marceloamaral

Yes, the command give me lots of PIDs:

$ kubectl -n monitoring exec -it ds/kepler -- ls /proc -1 | grep -E '^[0-9]+$' | wc -l
553

Jun 29 '23 12:06 fadam-csgroup

@fadam-csgroup can you please share the Kepler logs? Put it in https://pastebin.com/

Jun 29 '23 12:06 marceloamaral

@marceloamaral Here are the logs: https://pastebin.com/2i7RX0Y8

Jun 29 '23 12:06 fadam-csgroup

I0629 12:29:27.877410       1 process_metric.go:147] cannot extract: container_cpu_usage_seconds_total
I0629 12:29:27.877413       1 process_metric.go:147] cannot extract: container_memory_working_set_bytes
I0629 12:29:27.877418       1 process_metric.go:147] cannot extract: cgroupfs_memory_usage_bytes
I0629 12:29:27.877421       1 process_metric.go:147] cannot extract: cgroupfs_kernel_memory_usage_bytes
I0629 12:29:27.877424       1 process_metric.go:147] cannot extract: cgroupfs_tcp_memory_usage_bytes
I0629 12:29:27.877427       1 process_metric.go:147] cannot extract: cgroupfs_cpu_usage_us
I0629 12:29:27.877431       1 process_metric.go:147] cannot extract: cgroupfs_system_cpu_usage_us
I0629 12:29:27.877434       1 process_metric.go:147] cannot extract: cgroupfs_user_cpu_usage_us

this is probably the same issue as https://github.com/sustainable-computing-io/kepler/discussions/750#discussioncomment-6265641

@mcalman

Jun 29 '23 13:06 rootfs

@fadam-csgroup on your setup, is CPU and memory accounting turned on? The following is from my setup, the cpu and memory accounting are turned on for kubelet

# sudo systemctl show kubelet |grep -i accounting
CPUAccounting=yes
IOAccounting=no
BlockIOAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
IPAccounting=no

Jun 30 '23 20:06 rootfs

@rootfs

It seems that Memory accounting in enable but CPU is not:

# systemctl show kubelet |grep -i accounting
CPUAccounting=no
IOAccounting=no
BlockIOAccounting=no
MemoryAccounting=yes
TasksAccounting=yes
IPAccounting=no

Jul 03 '23 07:07 fadam-csgroup

kepler kepler copied to clipboard

Cannot get pod energy information

kepler
kepler copied to clipboard