kube-state-metrics
kube-state-metrics copied to clipboard
kube_pod_completion_time is not returned for some pods
What happened:
kube_pod_completion_time
is not returned for some pods
What you expected to happen:
Specifically I want to get the kube_pod_completion_time
metrics:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The metrics kube_pod_start_time
and kube_pod_created
can correctly collect data.
prometheus scrape Interval kube-state-metrics 5s
Environment:
- kube-state-metrics version: 2.6.0
- Kubernetes version (use
kubectl version
): 1.22.2 - Cloud provider or hardware configuration:
- Other info:
/assign @dgrisonnet /triage accepted
Could it perhaps be because these pods are running and as such are not completed yet?
Could it perhaps be because these pods are running and as such are not completed yet?
I executed creating a pod in the test, and then executed the delete operation, and waited for the deletion to complete before querying again.
We met the same issue. I think here is the reason: the pod is completed too fast for promthues to grab the metrics.
In our case, we set promthues grabbing interval to 30s. If the pod is completed and deleted within this interval, it won't be grabbed by promthues.
Our workaround: delay to delete the metric by 60 seconds
// Delete deletes an existing entry in the MetricsStore.
// Delete deletes an existing entry in the MetricsStore.
func (s *MetricsStore) Delete(obj interface{}) error {
o, err := meta.Accessor(obj)
if err != nil {
return err
}
go func(uuid types.UID) {
time.Sleep(60 * time.Second)
s.mutex.Lock()
defer s.mutex.Unlock()
delete(s.metrics,uuid)
}( o.GetUID())
return nil
}
N.B. if there are too many pods to be deleted, it will spawn too many goroutines.
I have the same issue. In my case I would like to get for how long a pod did run, so that then I can calculate the max, min and mean. I'm trying to use a query like kube_pod_completion_time - kube_pod_created
but it doesn't work since the metric kube_pod_completion_time
only gets data points from a single pod that ends up with the status Completed
.
I do suspect this metric only works on pods that end with status Completed
and not the ones that end with the status Terminating
or maybe the pods are not available long enough in this status to allow kube-state-metrics to scrape a data point while in the Terminating
status.
My scrape interval is 15s.
Can you please give support on this?
Same issue happened to me. We can't get kube_pod_completion_time
or kube_pod_created
metrics for very short lived pods (jobs run through Argo Workflow).
We normally always get pod completion times for pods launched with Jobs. The jobs have ttlSecondsAfterFinished set to ~ a day and the pods linger in Completed status for at least several hours.
However in other cases, such as with Deployments, the replicaSet controller of the pod immediately removes completed or terminated pods, so they are not present long enough for KSM to register the pod completion time and/or for Prometheus to record it - not sure which.
For the most strictly correct solution, I have the feeling that Deployments/Replicasets and other controllers should have some configuration to allow finished pods to linger in Completed state for a certain time period. Not sure if finalizers or anything else could be used as a workaround.