ingress-nginx
ingress-nginx copied to clipboard
Premature readiness probe success due to race condition in check for backends initialization, causing 404s
NGINX Ingress controller version
bash-5.1$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.1.2
Build: bab0fbab0c1a7c3641bd379f27857113d574d904
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
): 1.21
Environment:
- Cloud provider or hardware configuration: Bare metal (on-premises)
- OS (e.g. from /etc/os-release): Debian 10 (buster)
-
Kernel (e.g.
uname -a
): 5.10.0-13-amd64 - Install tools: N/A
- Basic cluster related info: N/A
-
How was the ingress-nginx-controller installed: Manifests. There are multiple instances of ingress-nginx using different ingressClassName and IngressClass
.spec.controller
. - Current state of ingress object, if applicable: There are thousands of them, this is probably triggering the issue :)
What happened:
One of our ingress classes has ~3k associated ingress objects. When a new ingress pod for this class starts up, it returns 404s for backends for a brief period of time, even after passing the readiness probe. We have increased the readinessProbe initialDelaySeconds to 40, which helps, but feels like a band-aid.
What you expected to happen:
The readiness probe should not pass until the upstreams are fully synchronized.
How to reproduce it:
I am working on a reproducer, but i think the actual issue is here:
- The readiness probe is checking that the backends data is set via
configuration.get_backends_data()
. - When the backends are POSTed by the controller, this variable is set directly but there is actually an asynchronous syncronization loop that later applies these backends to the underlying nginx proxy upstreams.
- This sync runs every second. But with 3000+ ingresses, many with multiple hosts (multiple server blocks in resulting nginx config), i am not actually sure how long a single sync takes (i guess it could be many seconds?).
- During the gap between these two, the pod is reporting ready but is serving 404s. This adds the pod to the service endpoints, and advertises it with BGP in our datacenter. Clients get 404s 😭.
I was able to reproduce this easily in kind
by making 3000 ingresses pointing to 3000 services, and looping over one of the ingress hosts using curl while doing a kubectl rollout restart
on the ingress controller deployment. the new pod returns 404 for a period of time after reporting ready.
Ah ok, at 3000 ingress objects and 300 services, its likely you are experiencing a real problem. Was that kind on a laptop or kind on a 8+cores with 32+GB RAM host. Assuming a single node kind cluster here.
It was kind 3-worker cluster on a host with 8 core / 16 threads and 64g memory, not a laptop but nothing crazy. I am not sure I really need 3000 Ingresses to reproduce, but that is how many we have in production so it is the number I started with.
I am planning to try changing the is-dynamic-lb-initialized probe to return false until the sync_backends has run at least once after backends are POSTed by the controller. But if someone is running this controller and has zero Ingresses I am worried it will never report ready 😧
I think I will need to know more details but if it possible for you to simulate 300 ingress objects, then you can maybe explore make dev-env
https://kubernetes.github.io/ingress-nginx/developer-guide/getting-started/#local-build , that produces a controller just for your test server environment.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
We're experiencing the exact same issue with "just" ~200 ingresses in our clusters.
how many replicas of the ingress-nginx-controller pod and how many instances of the ingress-nginx-controller and how many nodes in the cluster ?
It mostly happens in our busier clusters. In one of the latest examples that I checked there were 600 replicas of ingress controller and 900 nodes in the cluster.
Is it 600 replicas of one single instance of the controller ? How many instances of the controller in this 900 node cluster ?
What do you mean by instance? IngressClass? If so, then the answer is yes - 600 replicas of one instance.
One instance is one installation of the ingress-nginx-controller so thanks yes, one ingressClass would imply one installation of the ingress-nginx-controller.
This has been reported before and there is highest priority work in progress that includes attention to this, besides security. But the release of the new design is likely to emerge at the end of the current stabilization work-in-progress.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
We are also affected by this. We run NGINX-Ingress with HPA and it happens regularly on scale up. We currently got around 700 ingress objects.
@longwuyuan any update on the design work?
The design is basically a new approach to split the control-plane from the data-plane. Much progress has been made and the dev worked has reached some testing stage. You can search for the PR in progress (about cp/dp split)
/triage accepted /priority important-longterm