multus-cni icon indicating copy to clipboard operation
multus-cni copied to clipboard

Thick plugin graceful termination

Open dougbtv opened this issue 5 months ago • 1 comments

This PR introduces graceful shutdown functionality to the Multus daemon by adding a /readyz endpoint alongside the existing /healthz. The /readyz endpoint starts returning 500 once a SIGTERM is received, indicating the daemon is in shutdown mode. During this time, CNI requests can still be processed for a short window. The daemonset configs have been updated to increase terminationGracePeriodSeconds from 10 to 30 seconds, ensuring we have a bit more time for these clean shutdowns.

This addresses a race condition during pod transitions where the readiness check might return true, but a subsequent CNI request could fail if the daemon shuts down too quickly. By introducing the /readyz endpoint and delaying the shutdown, we can handle ongoing CNI requests more gracefully, reducing the risk of disruptions during critical transitions.

Major thanks to @deads2k for the find, identification, fix, and of course, the explanations. Appreciate it.

dougbtv avatar Sep 19 '24 18:09 dougbtv