ingress-gce icon indicating copy to clipboard operation
ingress-gce copied to clipboard

GCE load balancer health check does match k8s pod health

Open scarby opened this issue 3 years ago • 13 comments

Issue

It would appear that there is zero connection between kubernetes' concept of when a pod is healthy and the GCE load balancer's concept of the same.

As such when a deployment is updating:

  • kubernetes spins up new pods,
  • new pods pass their health-checks and kubernetes considers them to be ready
  • At this point the GCE load balancer is not guaranteed to have passed it's health-check,
  • k8s will then potentially terminate the old pods before new pods are considered healthy by the GCE load balancer (and they are instantly dropped from the NEG)

The only 'solution' we have found to this is to add a significant initial delay on the kubernetes health-checks. Not only is this hacky but it doesn't guarantee that there are actually pods able to serve traffic when he old pods are removed (we're just hoping)

Expected behaviour

I would expect k8s not to terminate the pod until the load balancer had a new pod ready to replace it

Is there any way to tie these two together so we avoid a situation where there are no pods available?

scarby avatar Jan 21 '22 18:01 scarby

When NEG is enabled, LB health checks are feedback into pod readiness https://cloud.google.com/kubernetes-engine/docs/concepts/container-native-load-balancing#pod_readiness

To configure custom LB health check, use BackendConfig

freehan avatar Jan 25 '22 17:01 freehan

Ok. I'm mistaken wrong no connection however I'm not sure this is fit for purpose.

GKE sets the value of cloud.google.com/load-balancer-neg-ready for a Pod to True if any of the following conditions are met:

One or more of the Pod's IP addresses are endpoints in a GCE_VM_IP_PORT NEG managed by the GKE control plane. The NEG is attached to a backend service. The load balancer health check for the backend service times out.

Which is likely what is happening in my case. If my health check times out I clearly don't want my pod to be considered ready?

So going back to my original point there appears to be no way to ensure there is actually a pod ready to serve traffic.

scarby avatar Jan 26 '22 17:01 scarby

/kind support

kundan2707 avatar Jan 27 '22 12:01 kundan2707

It appears that GCP load balancer creates health check once, when the ingress is created, and then never updates is. At least from what I have observed. From there on there is no connection between the pod state and the GCP load balancer. I have different health checks for startup and liveness. I don't want GCP load balancer to be hitting the startup health check, as it's quite heavy.

dry4ng avatar Feb 23 '22 16:02 dry4ng

Does this controller intentionally not update backend health checks? Changing readiness probes doesn't seem to change health checks on the backend.

jmcarp avatar Apr 25 '22 20:04 jmcarp

The ingress controller waits to make sure that a pod, on startup, is considered healthy by the load balancer before updating the readiness check on the pod. If in 15 minutes, the load balancer does not consider the pod ready, the readiness check on the pod will be ready. The idea is to only let Kubernetes consider the pod ready once the load balancer considers the pod ready. If you require a different health check for the load balancer, that can be specified using the BackendConfig CRD.

Are your pods taking longer than 15 minutes to pass the load balancer health check?

swetharepakula avatar May 20 '22 21:05 swetharepakula

The pods in our deployment can take up to 90s to fully initialize and pass the readiness probe (yay java!). The load balancer healthcheck is just hitting the tomcat listener. THIS ALONE passes before pod readiness passes, and marks the NEGS as ready. It seems that the load balancer backend shouldn't forward traffic to a pod unless both the backend healthcheck and the pod readiness are in a good state.

goobysnack avatar Jun 01 '22 22:06 goobysnack

I opened a Google case, and their response is "by design", and offered to open a feature request. That seems more like a bug than a feature.

My response:

This seems like a bug to fix, not a feature request.  Why would you bypass k8s readiness probes just because the ingress check passes?  That makes zero sense and undermines the purpose of readiness probes.```

goobysnack avatar Jun 01 '22 22:06 goobysnack

@goobysnack same story here, the GCP support ended up opening this feature request: https://issuetracker.google.com/230729446 for us.

After reading the code, issues, and design docs for readiness gates and this ingress-gce, I believe it's a non trivial issue to fix because the whole design of the Readiness Gates rely on transmitting the GCLB programmation success state to other components via the Pod Readiness condition. We are at a deadlock:

  • for proper rolling update, Deployment & co use the Pod Readiness condition to know when the new pods actually receive traffic from GCLB: gce-ingress-controller marks the Pod Ready (via the readiness gates) after it has successfully added it to the GCLB; that's the whole goal of the Readiness Gate feature.
  • We would like the gce-ingress-controller to ignore Pods that are not Ready (yet?)

Maybe a way forward would be for the gce-ingress-controller to use the Pod Ready minus its own gclb-readiness-gate; but that's not an information which is exposed in Endpoints/EndpointSlices (we only have ready).

In the meantime, a possible solution would be to have a sidecar container which computes that value by self inspecting (probably asking k8s api for self pod status to get individual containers conditions; doesn't seem ideal though), and exposes it as a HTTP endpoint, to be configured as the GCLB HealthCheck for that Pod/Service. I am not aware of any existing implementation of this idea though.

In the meantime, we forced the old Instance Group mode everywhere (vs NEG), where the traffic actually goes through kubernetes Services, which respect the Pod Ready condition; and accepted all the limitations of this old way.

thomas-riccardi avatar Jun 06 '22 10:06 thomas-riccardi

Thanks @thomas-riccardi for the great explanation!

Currently the load balancer health checks are the only signal we can provide to the load balancer that the pod is ready to receive traffic. We do not have a solution at this time for making the load balancer Kubernetes aware.

For those affected by this now, my recommendation is to make sure that the health check on the application only passes once the application is ready to accept traffic.

swetharepakula avatar Jun 14 '22 23:06 swetharepakula

Thanks @swetharepakula

Are there plans to improve the situation in GKE? Discussions in upstream kubernetes like there was for the introduction of the Readiness Gates? Because otherwise it seems we will be stuck with the old Ingress+IG, also loosing the much awaited Gateway API with all the new features (plus #33, #109, ...).

thomas-riccardi avatar Jun 15 '22 09:06 thomas-riccardi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 13 '22 10:09 k8s-triage-robot

I learned that the backend-config isn't assigned to the ingress annotation, it's assigned to the workload service. Once you do that, it all works like magic. That was the fine print in the documentation I missed.

goobysnack avatar Sep 13 '22 15:09 goobysnack

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 13 '22 15:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Nov 12 '22 16:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 12 '22 16:11 k8s-ci-robot