kube-state-metrics
kube-state-metrics copied to clipboard
deleted pods still reporting metrics
What happened:
it seems that sometimes metrics don't get deleted alongside the pod. It isn't until we churn all the kube-state-metrics pods that it fixes it.
What's even stranger is that it won't be all metrics for that pod that will incorrectly exist; for example, for a particular pod that was deleted, we noticed that it was still reporting kube_pod_container_status_waiting_reason
, but not kube_pod_container_resource_requests
.
What you expected to happen:
When a pod gets deleted, all metrics associated with that pod should also be deleted.
How to reproduce it (as minimally and precisely as possible):
It's unclear as to how this happens - whenever we try to reproduce by manually deleting a pod and querying for all its metrics ({pod="my_pod"}
), it seems to work just fine, i.e. the metrics all disappear.
Anything else we need to know?:
Environment:
- kube-state-metrics version: 2.2.0 (though we were experiencing this on 1.5.0 as well)
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:09:48Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: self-hosted k8s on aws
- Other info:
/kube-state-metrics
--port=9102
--telemetry-port=8081
--resources=configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets
--use-apiserver-cache
--metric-labels-allowlist=daemonsets=[*],deployments=[*],jobs=[*],nodes=[*],pods=[*],secrets=[*]
--pod=$(POD_NAME)
--pod-namespace=$(POD_NAMESPACE)
This could be related to https://github.com/kubernetes/kube-state-metrics/issues/694
Have you checked via kubectl that the pods in this state are actually deleted, and not in some non running state, such as Completed or Evicted?
@fredr Yes, they are definitely deleted.
Same thing happening to me on EKS
Seeing another instance of this. These two metrics existed at the same time for the pod named taskmanager-0
... the IP addresses differ because one IP is old one and the other IP is current one.
kube_pod_labels{
host="1.1.147.202"
instance="1.1.147.202:9102"
job="kubernetes-pods-k8s-production"
kubernetes_namespace="kube-system"
kubernetes_pod_name="kube-state-metrics-4"
pod="taskmanager-0"
...
}
kube_pod_labels{
host="1.1.188.37"
instance="1.1.188.37:9102"
job="kubernetes-pods-k8s-production"
kubernetes_namespace="kube-system"
kubernetes_pod_name="kube-state-metrics-8"
pod="taskmanager-0"
...
}
Happens to me with kube_pod_container_resource_requests and "Terminated" pods (but not yet removed by terminated pod garbage collector). KSM version: kube-state-metrics/kube-state-metrics:v2.4.1 I would expect that kube_pod_container_resource_requests would not return terminated pods (or at least expect them correctly labelled so I can filter them).
This case is expected since KSM exposes everything from the apiserver. If you are not interested in terminated pods, you can drop the series using relabeling.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.