kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Directly emit container ready time metric

Open bbdouglas opened this issue 2 years ago • 7 comments

What would you like to be added:

It would be great to have a metric for the container ready time in seconds to be emitted directly. There is currently a boolean gauge kube_pod_container_status_ready, which emits whether the container is ready or not, but that requires some computation to get at the time when the container flipped to the ready state. I'm interested in learning the amount of time it took between when the container started and when it was ready, and that would be simpler and more efficient to measure if kube-state-metrics emitted the ready time directly.

There was a similar metric added at the pod level (#1465), but this would be at the container level. In the pods that I am tracking, there are many containers with wildly varying ready times, so it is helpful for debugging and optimization purposes to know how long each container takes to get ready.

Why is this needed:

Similar to the pod-level ready time metric (#1465), I'd like to measure the ready time of each individual container within my pod. This is helpful for tracking startup-times at a finer level of granularity than the whole pod, especially when a pod has many containers.

It is possible to use the existing boolean kube_pod_container_status_ready boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true, but in practice that can be very resource intensive for Prometheus to calculate if there are a large number of pods/containers.

Describe the solution you'd like

I would ideally like to see a new metric analogous to kube_pod_status_ready_time emitted at the container granularity.

Additional context

I'm not that familiar with the internals of the Kubernetes API, but unfortunately it does not look like ContainerStatus has the same breadth of information as PodCondition, which includes a LastTransitionTime. So this might not be a simple addition.

bbdouglas avatar Jul 18 '23 20:07 bbdouglas

/triage accepted /assign @dgrisonnet

dashpole avatar Jul 27 '23 16:07 dashpole

The container level metric should already be available: https://github.com/kubernetes/kube-state-metrics/blob/02417fbc99f3adec84834fc59d5f89cf676ce006/internal/store/pod.go#L1342

dgrisonnet avatar Jul 28 '23 16:07 dgrisonnet

Hi @dgrisonnet, thanks for looking into this.

Unfortunately, I believe the metric you pointed to is actually at the pod level, representing the time that all containers are ready (ContainersReady). From the comments in the api:

// ContainersReady indicates whether all containers in the pod are ready.

bbdouglas avatar Jul 28 '23 21:07 bbdouglas

Correct, the name got me.

We should probably base kube_pod_status_container_ready_time on ContainerStatus rather than on the pod status.

dgrisonnet avatar Jul 31 '23 10:07 dgrisonnet

It is possible to use the existing boolean kube_pod_container_status_ready boolean to calculate this by looking at a series of data points and choosing the first point in time when that flag flips from false to true

@bbdouglas I am curious how you currently calculate this with promQL?

abhiraut avatar Jan 09 '24 23:01 abhiraut

@abhiraut Here is the query I came up with. Since it's looking back, you have to manually set the maximum age that you expect a pod to be up. Here I have assumed no pod lives for more than 1 day.

min_over_time(timestamp(kube_pod_container_status_ready{container="mycontainer", pod_phase="Running"} == 1)[1d])

bbdouglas avatar Jan 10 '24 06:01 bbdouglas

thanks ! @dgrisonnet do you think we can directly emit the ready time? i think it would be helpful and consistent with how the readiness is emitted at Pod level.

abhiraut avatar Jan 10 '24 18:01 abhiraut