kube-state-metrics
kube-state-metrics copied to clipboard
kube_pod_annotations reports incorrect nodes (all metrics for the same node)
What happened:
I have 50+ K8S nodes and they run similar app pods with annotations. All kube_pod_annotations have the same node, example:
count(kube_pod_annotations{annotation_jenkins_template!="",node=~"k8s-node55.*"})= 329 (so 329 pods assigned tok8s-node55, none to others)count(kube_pod_annotations{annotation_jenkins_template!="",node=~"k8s-node55.*"} and on (pod) kube_pod_info{node=~"k8s-node55.*"})= 4 (so there are 4 running pods on the node with annotations and this exactly matcheskubectl get podscommand)
What you expected to happen:
I expect kube_pod_annotations would have correct nodes in the metrics
How to reproduce it (as minimally and precisely as possible):
I have a typical configuration, no metrics are being changed or adjusted by that time. So install KSM with Helm, add an annotation to any pod, get metrics.
{
"__name__": "kube_pod_annotations",
"annotation_build_url": "...",
"annotation_jenkins_template": "...",
"app_kubernetes_io_component": "metrics",
"app_kubernetes_io_instance": "prometheus",
"app_kubernetes_io_managed_by": "Helm",
"app_kubernetes_io_name": "kube-state-metrics",
"app_kubernetes_io_part_of": "kube-state-metrics",
"app_kubernetes_io_version": "2.5.0",
"environment": "...",
"helm_sh_chart": "kube-state-metrics-4.13.0",
"instance": "...:8080",
"job": "kubernetes-service-endpoints",
"namespace": "...",
"node": "k8s-node55...",
"pod": "pod-xxx",
"service": "prometheus-kube-state-metrics",
"uid": "c8ec8299-85c2-48d5-b531-6f50acde9071"
}
Anything else we need to know?:
Everything works with fine other pod metrics, e.g. kube_pod_info or kube_pod_labels
Environment:
- kube-state-metrics version: 2.5.0
- Kubernetes version (use
kubectl version): v1.22.5 - Cloud provider or hardware configuration: bare metal servers
- Other info:
/triage accepted /assign @rexagod
@vyakovlev-hw Maybe I'm missing something but, I'm not sure how you're able to see the node label in kube_pod_annotations, since we are not exposing any as of v2.5.0 or even now.
@rexagod That's weird then: we haven't changed anything and that's what I have for kube_pod_annotations{job="kubernetes-service-endpoints"} (some labels were removed):
[
{
"metric": {
"__name__": "kube_pod_annotations",
"annotation_run_url": "job/xxx/job/yyy/job/bla/51/",
"environment": "k8s-xxx",
"instance": "10.xxx.yyy.zz7:8080",
"namespace": "xxx-xxx",
"node": "k8s-nodeXX.something.com",
"pod": "aa-bb-cc-dd",
"uid": "3c7e7ec1-b825-4272-ace9-3d800134d446"
},
"value": [
1671193424.902,
"1"
],
"group": 1
},
{
"metric": {
"__name__": "kube_pod_annotations",
"annotation_run_url": "job/xxx/job/yyy/job/bla/51/",
"environment": "k8s-xxx",
"instance": "10.xxx.yyy.zz7:8080",
"job": "kubernetes-service-endpoints",
"namespace": "xxx-xxx",
"node": "k8s-nodeXX.something.com",
"pod": "aa-bb-cc-dd",
"uid": "fd6a5712-c277-49c2-bc6a-a6dbe539e138"
},
"value": [
1671193424.902,
"1"
],
"group": 1
},
{
"metric": {
"__name__": "kube_pod_annotations",
"annotation_jenkins_template": "xxx-yyy",
"annotation_run_url": "job/xxx/job/yyy/job/bla/7/",
"environment": "k8s-xxx",
"instance": "10.xxx.yyy.zz7:8080",
"job": "kubernetes-service-endpoints",
"namespace": "xxx-xxx",
"node": "k8s-nodeXX.something.com",
"pod": "aa-bb-cc-dd",
"uid": "e7d6b497-e133-4214-9c6b-d0c09f425592"
},
"value": [
1671193424.902,
"1"
],
"group": 1
}
]
I have checked all Victoria Metrics rules so that we don't add this label somehow - no we don't.
@rexagod Hello, this bug affects not only kube_pod_annotations but also kube_pod_labels.
Environment:
- kube-state-metrics version: 2.9.2
- kube-state-metrics Helm cart version: 5.11.0
- Kubernetes version (use kubectl version): v1.24.17
- Cloud provider or hardware configuration: bare metal
We have a test cluster with multiple nodes (each having the name: clusternX).
Looking at our Prometheus config, I see the following section under relabel_configs of each of the jobs kubernetes-service-endpoints, kubernetes-service-endpoints-slow, kubernetes-pods, kubernetes-pods-slow:
- source_labels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
target_label: node
replacement: $1
action: replace
I have just created a plain alpine pod in the default namespace using this yaml file:
apiVersion: v1
kind: Pod
metadata:
name: alpine
spec:
containers:
- image: alpine:latest
command:
- /bin/sh
- "-c"
- "sleep 60m"
imagePullPolicy: IfNotPresent
name: alpine
And see it is running on node clustern2:
$ kubectl create -f alpine.yaml
$ kubectl describe po alpine | grep Node
Node: clustern2/192.168.70.102
Node-Selectors: <none>
This is what I see querying for the different metrics:
kube_pod_info{pod="alpine"}
kube_pod_info{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="prometheus", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.9.2", helm_sh_chart="kube-state-metrics-5.11.0", host_ip="192.168.70.102", host_network="false", instance="10.0.1.80:8080", job="kubernetes-service-endpoints", namespace="default", node="clustern2", pod="alpine", pod_ip="10.0.12.60", service="prometheus-kube-state-metrics", uid="5f6720c6-933e-49de-80fd-e301e6aa1367"}
kube_pod_labels{pod="alpine"}
kube_pod_labels{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="prometheus", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.9.2", helm_sh_chart="kube-state-metrics-5.11.0", instance="10.0.1.80:8080", job="kubernetes-service-endpoints", namespace="default", node="clustern1", pod="alpine", service="prometheus-kube-state-metrics", uid="5f6720c6-933e-49de-80fd-e301e6aa1367"}
kube_pod_annotations{pod="alpine"}
kube_pod_annotations{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="prometheus", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.9.2", helm_sh_chart="kube-state-metrics-5.11.0", instance="10.0.1.80:8080", job="kubernetes-service-endpoints", namespace="default", node="clustern1", pod="alpine", service="prometheus-kube-state-metrics", uid="5f6720c6-933e-49de-80fd-e301e6aa1367"}
The alpine pod has been scheduled and running on clustern2 node only, it has and never had anything to do with clustern1 node. Our assumption is that we get the wrong node label as this is the node where the kube-state-metrics pod is running:
$ kubectl describe po prometheus-kube-state-metrics-76d96875dc-9qhl2 -n monitoring | grep Node
Node: clustern1/192.168.72.101
Node-Selectors: <none>
After looking more into it, it seems to me that only for kube_pod_info the node label is replaced correctly (using the relabel config mechanism mentioned above). It is wrong also for queries like kube_pod_container_info and also appears on queries like kube_deployment_labels where it is not relevant.