helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Graceful Termination Of Kubernetes Ingress Pods

Open smartinov opened this issue 1 year ago • 1 comments

Hello everyone,

I'm having a bit of an issue with the ingress controller graceful termination in k8s.

The thing is that when shutting down the ingress controller, it doesn't manage to de-register itself from the k8s service load balancer correctly so the service is still sending traffic to the pod that is being shut down.

That means that whenever a cluster upgrade, node termination, or upgrade is happening, the traffic loses a few connections.

This could be solved by simply adding

  1. Readyz health check to indicate that the service is no longer ready to accept traffic (https://kubernetes.io/docs/reference/using-api/health-checks/). This could be done when the SIGTERM signal is received and before any connection shutdown.
  2. Waiting until the readyz check puts it to "unhealthy" and then gracefully finishing all the leftover connections.

In essence, the Service sends traffic to the pod being shut down and doesn't de-register it right away until it's either dead or deemed unhealthy.

I've checked the lifecycle parameter, but this doesn't do the trick since the Service still sends some traffic to the pod that has been shut down.

smartinov avatar Feb 16 '24 13:02 smartinov

hi @smartinov,

The thing is that when shutting down the ingress controller, it doesn't manage to de-register itself from the k8s service load balancer correctly so the service is still sending traffic to the pod that is being shut down.

ingress controller is doing graceful termination in k8s, not sure what do you mean, we handle signals properly and we do graceful shutdown. probes are available and can be used. it seems like this balancer is not properly reading k8s events.

Readyz health check to indicate that the service is no longer ready to accept traffic

yes, we already have probes set and they are used as they should. When we receive terminal signal, HAProxy stops accepting new connections, and that includes also the probes.

Waiting until the readyz check puts it to "unhealthy" and then gracefully finishing all the leftover connections.

its unhealthy as soon as we receive signal, both ingress controller and HAProxy stops receiving traffic, but existing connections are still processed.

In essence, the Service sends traffic to the pod being shut down

that is not our problem but problem of that service, if its marked to be terminated, why does it still sends traffic to it.

just as a note: handling termination requires additional care, I know since Ingress controller especially treats this, and as soon some POD is in terminate state, we immediately stop sending traffic to it.

oktalz avatar Feb 19 '24 13:02 oktalz