scaphandre
scaphandre copied to clipboard
missing kubernetes pods metric labels
Hi! I am testing scaphandre deploying it in a kubernetes cluster with the --containers flag and the metrics provided are not showing any of the extra labels for kubernetes pods mentioned in the docs (kubernetes_node_name, kubernetes_pod_name...etc)
example of the metrics I am getting:
scaph_process_power_consumption_microwatts{cmdline="nginx: worker process",pid="2152",exe="nginx"} 0 scaph_process_power_consumption_microwatts{cmdline="nginx: worker process",pid="2151",exe="nginx"} 0 scaph_process_power_consumption_microwatts{pid="2150",exe="nginx",cmdline="nginx: worker process"} 0 scaph_process_power_consumption_microwatts{pid="2149",exe="nginx",cmdline="nginx: worker process"} 0 scaph_process_power_consumption_microwatts{exe="nginx",pid="2148",cmdline="nginx: worker process"} 0
I am running scaphandre inside a Kind multinode cluster deployed with the following configfile:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30000
hostPort: 30000
protocol: TCP
- containerPort: 31000
hostPort: 31000
protocol: TCP
- containerPort: 32000
hostPort: 32000
protocol: TCP
extraMounts:
- hostPath: /var/run/docker.sock
containerPath: /var/run/docker.sock
- role: worker
extraMounts:
- hostPath: /var/run/docker.sock
containerPath: /var/run/docker.sock
- role: worker
extraMounts:
- hostPath: /var/run/docker.sock
containerPath: /var/run/docker.sock
Note I am mounting the hosts docker.sock as a volume in order for scaphandre to have access to it.
Then I install scaphandre using the helm chart:
$ helm install scaphandre helm/scaphandre
Scaphandre logs:
Scaphandre prometheus exporter Sending ⚡ metrics scaphandre::exporters::prometheus: 2023-09-14T13:22:37: Starting Prometheus exporter Press CTRL-C to stop scaphandre scaphandre::exporters::prometheus: 2023-09-14T13:23:42: Refresh topology scaphandre::sensors: Before refresh procs init. scaphandre::exporters::prometheus: 2023-09-14T13:23:42: Refresh data scaphandre::exporters: 2023-09-14T13:23:42: Get self metrics scaphandre::exporters: 2023-09-14T13:23:42: Get host metrics scaphandre::exporters: 2023-09-14T13:23:42: Get socket metrics scaphandre::exporters: 2023-09-14T13:23:42: Get system metrics scaphandre::exporters: 2023-09-14T13:23:42: Get process metrics scaphandre::exporters: First check done on pods. scaphandre::exporters::prometheus: 2023-09-14T13:23:53: Refresh topology scaphandre::sensors: Before refresh procs init. scaphandre::exporters::prometheus: 2023-09-14T13:23:53: Refresh data scaphandre::exporters: 2023-09-14T13:23:53: Get self metrics scaphandre::exporters: 2023-09-14T13:23:53: Get host metrics scaphandre::exporters: 2023-09-14T13:23:53: Get socket metrics scaphandre::exporters: 2023-09-14T13:23:53: Get system metrics scaphandre::exporters: 2023-09-14T13:23:53: Get process metrics scaphandre::exporters::prometheus: 2023-09-14T13:24:35: Refresh topology scaphandre::sensors: Before refresh procs init. scaphandre::exporters::prometheus: 2023-09-14T13:24:35: Refresh data scaphandre::exporters: 2023-09-14T13:24:35: Get self metrics scaphandre::exporters: 2023-09-14T13:24:35: Get host metrics scaphandre::exporters: 2023-09-14T13:24:35: Get socket metrics scaphandre::exporters: 2023-09-14T13:24:35: Get system metrics scaphandre::exporters: 2023-09-14T13:24:35: Get process metrics scaphandre::exporters: Just refreshed pod list ! last: 1694697822 now: 1694697875, diff: 53
I would like to know if this feature is working as of today and in such case what are the kubernetes deployment requirements for it to work, or the recommended environment, meaning:
- type of deployment (using kubeadm, microk8s, k3s, kind...)
- container runtime (containerd, cri-docker,...)
- kubernetes version
- host operating system version
- anything else??
Hi, Looking into scaphandre code I have seen that when looking for the container name, a regex is used to look into the cgroups. The regex is: Regex::new(r"^/kubepods.*$").unwrap();
So this is looking for something starting with /kubepods while the cgroups of my container processes I have seen that look like this: /kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/
So it is not able to get the container id and thus not able to resolve the pod name.
I think this naming difference of the cgroup hierarchy might be related with Kind. Any idea if this is the case? and if so, is Kind going to be supported in the future?
Also I have seen that cgroups v2 has been introduced in newer versions of linux distros, is that something that could potentially affect the proper functioning of scaphandre? if so, is it going to be supported in the future aswell?
Also I would like to ask again for your recommended kubernetes cluster requirements in order to test all of scaphandre features properly.
Kind regards and thanks for your awesome project
Hello, I've faced the same issue with my context. Seems to be related to this line as you stated it : https://github.com/hubblo-org/scaphandre/blob/main/src/sensors/utils.rs#L422
Scaphandre seems to use /proc/{PID}/cgroup file to figure out per process extra informations to provide to prometheus exporter as labels.
Documentation : https://man7.org/linux/man-pages/man7/cgroups.7.html State :
...
/proc files
/proc/cgroups (since Linux 2.6.24)
...
/proc/pid/cgroup (since Linux 2.6.24)
This file describes control groups to which the process
with the corresponding PID belongs. The displayed
information differs for cgroups version 1 and version 2
hierarchies.
For each cgroup hierarchy of which the process is a
member, there is one entry containing three colon-
separated fields:
hierarchy-ID:controller-list:cgroup-path
For example:
5:cpuacct,cpu,cpuset:/daemons
The colon-separated fields are, from left to right:
[1] For cgroups version 1 hierarchies, this field
contains a unique hierarchy ID number that can be
matched to a hierarchy ID in /proc/cgroups. For the
cgroups version 2 hierarchy, this field contains the
value 0.
[2] For cgroups version 1 hierarchies, this field
contains a comma-separated list of the controllers
bound to the hierarchy. For the cgroups version 2
hierarchy, this field is empty.
[3] This field contains the pathname of the control group
in the hierarchy to which the process belongs. This
pathname is relative to the mount point of the
hierarchy.
In my case, the cgroup was something like that : 0::/kubepods/burstable/pod348e8c15-e2a8-41d4-ae41-64dd1b6248df/d8e314cffefd00e08ab729a482563d237b27a74f524aa6df936b5bc50a8fde50
For Scaphandre to be able to grab the ID from this content according to this line : https://github.com/hubblo-org/scaphandre/blob/dev/src/sensors/utils.rs#L421
(Seems to only take the last value of a "/" split before going in next steps). I think the ID is then used to request Namespace and others informations, but haven't pushed further.
For us, the issue was --containers not using /proc/{PID}/cgroups reliably enough to match in all good cases.
In short, we recompiled Scaphandre with this modification on the Regex :
- "^/kubepods.*$"
- "/kubepods.*$"
and it worked, because ignoring the leading 0:: where the actual version doesn't but it's a dirty quick fix.
Maybe you could check your own /proc/{PID}/cgroups to check if you got your cgroup file content formatted the same. Take a PID from ps -faux on a node of your cluster corresponding to a process executing inside a container of your Kubernetes cluster ?