ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Premature readiness probe success due to race condition in check for backends initialization, causing 404s

Open bnu0 opened this issue 2 years ago • 19 comments

NGINX Ingress controller version

bash-5.1$ /nginx-ingress-controller --version  
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.1.2
  Build:         bab0fbab0c1a7c3641bd379f27857113d574d904
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.9

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version): 1.21

Environment:

  • Cloud provider or hardware configuration: Bare metal (on-premises)
  • OS (e.g. from /etc/os-release): Debian 10 (buster)
  • Kernel (e.g. uname -a): 5.10.0-13-amd64
  • Install tools: N/A
  • Basic cluster related info: N/A
  • How was the ingress-nginx-controller installed: Manifests. There are multiple instances of ingress-nginx using different ingressClassName and IngressClass .spec.controller.
  • Current state of ingress object, if applicable: There are thousands of them, this is probably triggering the issue :)

What happened:

One of our ingress classes has ~3k associated ingress objects. When a new ingress pod for this class starts up, it returns 404s for backends for a brief period of time, even after passing the readiness probe. We have increased the readinessProbe initialDelaySeconds to 40, which helps, but feels like a band-aid.

What you expected to happen:

The readiness probe should not pass until the upstreams are fully synchronized.

How to reproduce it:

I am working on a reproducer, but i think the actual issue is here:

  1. The readiness probe is checking that the backends data is set via configuration.get_backends_data().
  2. When the backends are POSTed by the controller, this variable is set directly but there is actually an asynchronous syncronization loop that later applies these backends to the underlying nginx proxy upstreams.
  3. This sync runs every second. But with 3000+ ingresses, many with multiple hosts (multiple server blocks in resulting nginx config), i am not actually sure how long a single sync takes (i guess it could be many seconds?).
  4. During the gap between these two, the pod is reporting ready but is serving 404s. This adds the pod to the service endpoints, and advertises it with BGP in our datacenter. Clients get 404s 😭.

bnu0 avatar Apr 08 '22 16:04 bnu0

I was able to reproduce this easily in kind by making 3000 ingresses pointing to 3000 services, and looping over one of the ingress hosts using curl while doing a kubectl rollout restart on the ingress controller deployment. the new pod returns 404 for a period of time after reporting ready.

bnu0 avatar Apr 16 '22 02:04 bnu0

Ah ok, at 3000 ingress objects and 300 services, its likely you are experiencing a real problem. Was that kind on a laptop or kind on a 8+cores with 32+GB RAM host. Assuming a single node kind cluster here.

longwuyuan avatar Apr 16 '22 02:04 longwuyuan

It was kind 3-worker cluster on a host with 8 core / 16 threads and 64g memory, not a laptop but nothing crazy. I am not sure I really need 3000 Ingresses to reproduce, but that is how many we have in production so it is the number I started with.

I am planning to try changing the is-dynamic-lb-initialized probe to return false until the sync_backends has run at least once after backends are POSTed by the controller. But if someone is running this controller and has zero Ingresses I am worried it will never report ready 😧

bnu0 avatar Apr 16 '22 02:04 bnu0

I think I will need to know more details but if it possible for you to simulate 300 ingress objects, then you can maybe explore make dev-env https://kubernetes.github.io/ingress-nginx/developer-guide/getting-started/#local-build , that produces a controller just for your test server environment.

longwuyuan avatar Apr 16 '22 03:04 longwuyuan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 15 '22 03:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 14 '22 04:08 k8s-triage-robot

/remove-lifecycle rotten

amirschw avatar Aug 17 '22 12:08 amirschw

We're experiencing the exact same issue with "just" ~200 ingresses in our clusters.

amirschw avatar Aug 21 '22 13:08 amirschw

how many replicas of the ingress-nginx-controller pod and how many instances of the ingress-nginx-controller and how many nodes in the cluster ?

longwuyuan avatar Aug 21 '22 14:08 longwuyuan

It mostly happens in our busier clusters. In one of the latest examples that I checked there were 600 replicas of ingress controller and 900 nodes in the cluster.

amirschw avatar Aug 21 '22 14:08 amirschw

Is it 600 replicas of one single instance of the controller ? How many instances of the controller in this 900 node cluster ?

longwuyuan avatar Aug 21 '22 14:08 longwuyuan

What do you mean by instance? IngressClass? If so, then the answer is yes - 600 replicas of one instance.

amirschw avatar Aug 21 '22 15:08 amirschw

One instance is one installation of the ingress-nginx-controller so thanks yes, one ingressClass would imply one installation of the ingress-nginx-controller.

This has been reported before and there is highest priority work in progress that includes attention to this, besides security. But the release of the new design is likely to emerge at the end of the current stabilization work-in-progress.

longwuyuan avatar Aug 21 '22 15:08 longwuyuan

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 19 '22 15:11 k8s-triage-robot

/remove-lifecycle stale

walthowd avatar Nov 19 '22 16:11 walthowd

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 19 '22 16:12 k8s-triage-robot

/remove-lifecycle rotten

amirschw avatar Dec 19 '22 16:12 amirschw

We are also affected by this. We run NGINX-Ingress with HPA and it happens regularly on scale up. We currently got around 700 ingress objects.

@longwuyuan any update on the design work?

jan-kantert avatar Nov 24 '23 09:11 jan-kantert

The design is basically a new approach to split the control-plane from the data-plane. Much progress has been made and the dev worked has reached some testing stage. You can search for the PR in progress (about cp/dp split)

/triage accepted /priority important-longterm

longwuyuan avatar May 01 '24 01:05 longwuyuan