kubernetes-graceful-shutdown-example icon indicating copy to clipboard operation
kubernetes-graceful-shutdown-example copied to clipboard

Failing readiness probe on shutdown not needed

Open florianmutter opened this issue 7 years ago • 1 comments

According to https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-liveness-or-readiness-probes a failing readiness probe is not needed for shutting down a pod. It makes no difference if it fails or not.

Note that if you just want to be able to drain requests when the Pod is deleted, you do not necessarily need a readiness probe; on deletion, the Pod automatically puts itself into an unready state regardless of whether the readiness probe exists. The Pod remains in the unready state while it waits for the Containers in the Pod to stop.

florianmutter avatar Aug 23 '18 14:08 florianmutter

That confuses me too. What doesn‘t add up is that according to the ab example in the blog post to this repo, the logs show that requests are being routed to a shutting down pod after SIGTERM (Request after sigterm: /). According to documentation:

At the same time as the kubelet is starting graceful shutdown, the control plane removes that shutting-down Pod from Endpoints (and, if enabled, EndpointSlice) objects where these represent a Service with a configured selector . ReplicaSets and other workload resources no longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from the list of endpoints as soon as the termination grace period begins.

Maybe the „at the same time“ part is causing some last requests to slip through?

Another thing that confuses me is this paragraph from this project‘s readme:

In our case Kubernetes livenessProbe won't kill the app before graceful shutdown because needs to wait (failureThreshold * periodSecond) to do it, so livenessProve threshold should be larger than readinessProbe threshold (graceful stop happens around 4s, force kill would happen 30s after SIGTERM)

Isn‘t the livenessProbe irrelevant during the graceful shutdown period? I yet need to try this, but it would be weird if during shutdown before the grace period ended, the livenessProbe would kill the pod.

fabb avatar Sep 16 '20 21:09 fabb