Feature Request: Add terminationGracePeriodSeconds field in liveness probe
/area API /kind good-first-issue /kind spec
Describe the feature
The kubernetes pod spec allows us to mention terminationGracePeriodSeconds under livenessProbe for a specific container. This helps restarting a failed container. Current the knative service spec ignores this field.
docs for reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#probe-level-terminationgraceperiodseconds
My use case:
The liveness probe for the container fails and the container receives a sigterm. The container code went into a deadlock and doesnt respect gracefull shutdown. Now the container waits for pod terminationGracePeriodSeconds which is set by knative. This is taking around 300 second for the pod to restart. I dont want this to setting because of request timeout, but want the container to restart as soon as liveness probe fails.
This seems like a duplicate of https://github.com/knative/serving/issues/15599
Hi @FloMedja this is not a duplicate of #15599. 15599 talks about deleting deployment and it talks about terminationGracePeriodSeconds in pod spec.
This issue is about adding terminationGracePeriodSeconds in livenessProbe. This is need if the container get sigterm, it should restart as soon as possible. The pod wont be terminated.
My bad. I get it.
@flomedja what do you think of this feature?
@skonto @dprotaso Was there a specific reason for ignoring the terminationGracePeriodSeconds at the container level ?
So Knative by default sets the terminationGracePeriod to the revision's timeoutseconds. This handles the scenario where a request comes in around the time the pod is being terminated - thus we want to handle that request and not fail.
The kubernetes pod spec allows us to mention terminationGracePeriodSeconds under livenessProbe for a specific container.
I'm reading the K8s docs and I really don't see the motivation for this feature. Do you have a KEP link handy?
The container code went into a deadlock and doesnt respect gracefull shutdown
Which container? the queue proxy or the user's app?
Ok - I found the KEP and it explained the use
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2238-liveness-probe-grace-period#summary
Yeah I think this makes sense to support.
From the KEP the TerminationGracePeriod in the probe is used for abnormal termination when the liveness probes fail.
The top-level TerminationGracePeriod is for normal shutdown.
I think it should be on the user's app (and sidecar if defined) and not the queue-proxy. Do you see a use case to make it configurable for the queue-proxy too ?
@gane5hvarma was that what you have in mind ?
Do you see a use case to make it configurable for the queue-proxy too ?
I don't believe we set a liveness probe on the queue proxy so we can skip it here
Sounds good to me, I can take care of the issue 🙂.
Just for my understanding, is there a specific reason why we don't set a liveness probe on the queue-proxy?
@dprotaso from what I understand the #15599 is talking about the terminationGracePeriodSeconds on the pod level and this one on the container level in the livenessProbe.
@flomedja yeah
I think it should be on the user's app (and sidecar if defined) and not the queue-proxy. Do you see a use case to make it configurable for the queue-proxy too ?
@gane5hvarma was that what you have in mind ?
My use case is for the user's app and not the queue-proxy. I dont think its required for queue-proxy since it handles graceful termination
@gane5hvarma can I work on this issue?
I am working on the issue :) . I Should have a PR ready soon :)
@dprotaso I am pretty much ready for pushing my MR. I asked you a little question on CNCF slack about startup probes validation for sidecar containers . After getting an answer, I will be ready to push the MR.
@flomedja are you still working on this issue? There is a commit referenced but not sure how complete it is.
I would otherwise assign this issue myself to finalize it for the current milestone (we already missed 1.19 and 1.20).
@linkvt would it be okay if I take over? I want to contribute to knative and this was the only good-first-issue tagged issue there is.
@linkvt Sorry for the delay. I was waiting for an information from @dprotaso about the task and get sidetracked by some work afterwards. I will have a MR ready during this week.
@linkvt I pushed the MR. As requested by @dprotaso I add an issue https://github.com/knative/serving/issues/16256 for adding validation for startup probes.