k8s-node-termination-handler icon indicating copy to clipboard operation
k8s-node-termination-handler copied to clipboard

What if containers take more than 30 secs to start?

Open itsmesuniljacob opened this issue 3 years ago • 2 comments

Hi Team,

There is scenario where many of our containers may take more than 30 secs to start. In other words, 30 seconds (probably) will not be enough for new replicas to start when VMs receive preemption signal.

Is it possible to modify this draining_timeout_when_node_expired_ms values to 45 secs to solve the above problem?

itsmesuniljacob avatar Feb 09 '21 05:02 itsmesuniljacob

Hi, I have same issue. According to documentation : https://cloud.google.com/compute/docs/instances/preemptible#preemption-process Compute Engine sends a preemption notice to the instance in the form of an ACPI G2 Soft Off signal. You can use a shutdown script to handle the preemption notice and complete cleanup actions before the instance stops. If the instance does not stop after 30 seconds, Compute Engine sends an ACPI G3 Mechanical Off signal to the operating system.

I think there is no way to override this 30s deadline after G2 ACPI call.

I reduce downtime with using replica and pod anti-affinity.

nrx-ops avatar Feb 11 '21 17:02 nrx-ops

Sure

itsmesuniljacob avatar Mar 22 '21 11:03 itsmesuniljacob