ingress-nginx
ingress-nginx copied to clipboard
`Client.Timeout exceeded` (30s) on validation webhooks when updating Ingress objects
What happened:
We continue to hit the (max) timeout on our validation webhook when applying ingress manifests.
failed to call webhook: Post "https://ingress-nginx-controller-admission.ingress.svc:443/networking/v1/ingresses?timeout=30s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
It is consistently high, in the 20s mark, while general load or ingress applies in quick succession might push it to 30s where deploy pipelines start to fail.
The above image shows a graph of validation time, metric given by nginx itself, over 24 hours earlier this week.
This is me adding a label, to illustrate one simple update:
torvald@surdeig ~ $ time kubectl patch ing <ingress> --type='json' -p='[{"op": "add", "path": "/metadata/labels/testing", "value": "testing"}]'
ingress.networking.k8s.io/<ingress> patched
real 0m17.724s
user 0m0.396s
sys 0m0.057s
This is in a medium sized cluster,
- ~130 nodes
- 270 ingresses
- 3 pods for nginx a 8GB RAM (request/limit) and 5 CPUs (request)
- ~1000 rps at peak (see graph below)
- 9.9 MB nginx config file (296k lines, 187
server_names
, 4778locations
)
Request rate
Over the same time period as above.
Performance of pods
To comment on this, it looks and feels quite bearable. Spikes in CPUs are assumed to be nginx reloads and validations runs. Over the same time period as above.
90 days trends:
The image above show the number of ingresses over the last 90 days.
The image above shows the validation webhook duration over the last 90 days. This mostly support an organic growth of sorts, except the the quick changed marked in the picture above; this has been tracked down to 10 ingresses (serving the same host) that changed from 1 host to 3 so the collection of ~60 paths over 1 host became ~180 over 3 hosts.
See an example of such ingress post change
What you expected to happen:
I've seen people mention far better performance then 20-30s on their validation webhook in other issues around here, and that with larger clusters and larger nginx config files. So my expectations would be in the 1-5s mark.
This PR will probably help us in the cases where multiple ingresses at the same time gets applied - but one or a few single applies should probably not take 20s?
NGINX Ingress controller version
nginx/1.21.6, release v1.9.5
torvald@surdeig ~ $ kubectl exec -it nginx-ingress-controller-5d66477fb7-jttwl -- /nginx-ingress-controller --version
Defaulted container "nginx-ingress-controller" out of: nginx-ingress-controller, opentelemetry (init), sysctl (init), geoip-database-download (init)
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.9.5
Build: f503c4bb5fa7d857ad29e94970eb550c2bc00b7c
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.21.6
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
):
torvald@surdeig ~ $ kubectl version --short
Client Version: v1.25.0
Kustomize Version: v4.5.7
Server Version: v1.27.10-gke.1055000
Environment:
- Cloud provider or hardware configuration: GCP, managed GKE; e2-custom-16-32768
- OS (e.g. from /etc/os-release): Container-Optimized OS with containerd (cos_containerd)
-
Kernel (e.g.
uname -a
): 5.15.133+ -
How was the ingress-nginx-controller installed: It probably originated from a helm chart once, but everything has evolved in our own git repo since then. I'll attach the relevant files.
- cat nginx-ingress-deployment-controller.yaml
- cat configmaps.yaml
- cat validatingwebhookconfiguration.yaml
-
kubectl get -n ingress all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-ingress-controller-5d66477fb7-8qtfs 1/1 Running 0 20h 10.4.117.224 gke-k8s-prod-k8s-prod-standard-v8-611652ec-5lt5 <none> <none>
pod/nginx-ingress-controller-5d66477fb7-jttwl 1/1 Running 0 3h9m 10.4.71.52 gke-k8s-prod-k8s-prod-standard-v8-68b36906-wgdl <none> <none>
pod/nginx-ingress-controller-5d66477fb7-wlw6t 1/1 Running 0 20h 10.4.1.143 gke-k8s-prod-k8s-prod-standard-v8-7c2e0d29-s7vz <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/ingress-nginx-controller-admission ClusterIP 10.6.12.178 <none> 443/TCP 2y198d app=nginx-ingress,component=controller
service/ingress-nginx-controller-collector-metrics ClusterIP 10.6.4.251 <none> 8888/TCP 574d app=nginx-ingress,component=controller
service/ingress-nginx-controller-metrics ClusterIP 10.6.6.244 <none> 10254/TCP 2y198d app=nginx-ingress,component=controller
service/nginx-ingress-controller LoadBalancer 10.6.8.95 <redacted> 80:31151/TCP,443:30321/TCP 2y198d app=nginx-ingress,component=controller
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-ingress-controller 3/3 3 3 2y198d nginx-ingress-controller registry.k8s.io/ingress-nginx/controller:v1.9.5 app=nginx-ingress,component=controller
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-ingress-controller-5d66477fb7 3 3 3 20h nginx-ingress-controller registry.k8s.io/ingress-nginx/controller:v1.9.5 app=nginx-ingress,component=controller,pod-template-hash=5d66477fb7
- Current state of ingress object, if applicable:
See an example of an ingress, the same as mentioned above in the «What happened» section.
How to reproduce this issue:
I think this would be unfeasible, but I'm happy to assist with more details.
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Is it possible for you to get logs and events from those 20-30 secnds
/remove-kind bug /triage needs-information
torvald@surdeig ~ $ date; time kubectl patch ing tech-radar-auth --type='json' -p='[{"op": "add", "path": "/metadata/labels/testing", "value": "testing"}]'; date;
Mon Apr 15 09:29:26 AM CEST 2024
ingress.networking.k8s.io/tech-radar-auth patched
real 0m19.327s
user 0m0.350s
sys 0m0.067s
Mon Apr 15 09:29:45 AM CEST 2024
Excluding lines containing:
- "remote_addr" (pattern not to include pure request logs and client IP addresses)
- "Endpoint"
- "body is buffered"
- "Error getting SSL certificate"
- "Unexpected error validating SSL certificate"
- "does not contain a Common Name"
I'm left with these for the time period as mention above + 15 sec (09:29:26 - 09:29:59):
2024-04-15 09:29:45.477 admission.go:149] processed ingress via admission controller {testedIngressLength:269 testedIngressTime:18.855s renderingIngressLength:269 renderingIngressTime:0.002s admissionTime:9.5MBs testedConfigurationSize:18.857}
2024-04-15 09:29:45.478 main.go:107] "successfully validated configuration, accepting" ingress="infra/tech-radar-auth"
2024-04-15 09:29:45.486 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"infra", Name:"tech-radar-auth", UID:"e635a88c-e72e-401f-8a9a-afa536b06c11", APIVersion:"networking.k8s.io/v1", ResourceVersion:"3624067364", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
2024-04-15 09:29:45.486 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"infra", Name:"tech-radar-auth", UID:"e635a88c-e72e-401f-8a9a-afa536b06c11", APIVersion:"networking.k8s.io/v1", ResourceVersion:"3624067364", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
2024-04-15 09:29:45.487 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"infra", Name:"tech-radar-auth", UID:"e635a88c-e72e-401f-8a9a-afa536b06c11", APIVersion:"networking.k8s.io/v1", ResourceVersion:"3624067364", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
But for completeness sake, here is the full dump (redacted for business info, and all request logs are excluded): nginx-logs-ing-update-2024-04-15 09 33 02.414.txt
- Does the node on which the controller pods are scheduled see any resource crunch
- Have you configured limits and resources on the pods
There are lots of ssl errors and if you are saying that you submit lots of new ingress json to the api-server and the controller pod needs to reconcile & sync while there is resource crunch, then the stats could be the impact
- Does the node on which the controller pods are scheduled see any resource crunch
No, we have fairly low utilization on all node. These are the three nodes from the same point (marked with the arrows) in time as the previous mentioned test. CPU to the left, memory to the right.
- Have you configured limits and resources on the pods
Yes, full deployment manifest here.
resources:
requests:
cpu: 5000m
memory: 8000Mi
limits:
memory: 8000Mi
There are lots of ssl errors and if you are saying that you submit lots of new ingress json to the api-server and the controller pod needs to reconcile & sync while there is resource crunch, then the stats could be the impact
Events that make nginx sync data can be seen here, my test from earlier is marked, a period of low «sync volume».
Yes, there are a unfortunate amount of SSL errors. Zooming out, these are rather constant and I don't feel like they should affect the validation of my config to the point it take 20s. Following the same argument, there should be periods of low resource crunch where we should see validation duration of only a few seconds, but we don't.
I can try and eliminate the SSL errors before going further to get rid of this suspicion.
I want to give enough time to this so we can get the root cause. If you ping me on slack we could see how to do that.
We had a video chat and ran through some config and settings to validate the «normal stuff». We SSH into the node of the master nginx pod to look for clues in dmesg logs, especially related to connection tracking (conntrack
). Nothing of significance, but we'll try and inspect the webhook with tcpdump get a better grip of where time is lost.
We have a dev
environment with 60% of the same ingresses and we see high-ish numbers there as well, in the 6-8s mark.
I tried package dumping on our nginx pods there, and it seems like all time is spent inside nginx.
torvald@surdeig ~ $ date; time kubectl -n infra patch ing interno-public --type='json' -p='[{"op": "add", "path": "/metadata/labels/testing", "value": "testing"}]'; date
Wed Apr 17 03:45:05 PM CEST 2024
ingress.networking.k8s.io/interno-public patched
real 0m7.567s
user 0m0.413s
sys 0m0.047s
Wed Apr 17 03:45:13 PM CEST 2024
TCP and TLS handshake happens immidiatly, and then there is a 7s holdup waiting for a response from nginx on the admission webhook port 8443.
This tells me this is not a network problem.
Thanks.
This data is proof that when you say "network", you really mean network does not have port blocked
or no ip/tcp blocking via packet-filter
.
But as I hinted on the video call, the truly useful information for this is to to know if a test like telnet ingress-nginx-controller-admission 443
or telnet $controller-leader-podname 8443
or same test with *nix tool netcat (nc), shows connection working or jammed/wedged for 7+ seconds. This is a test of busy/choked network causing delay and not a test of connectivity blocked or not (for those few seconds)
Before going deeper down the network route, i came across a scent of what i think is the smoking gun: adding lots of paths seem to correlate well with the validation times we are seeing. More path items within one of our ingresses, produces longer validation times. If paths are spread out over multiple ingresses, it seems to be okay, but cramping many paths into one ingress seem to explode.
Our dev
cluster has 180 ingresses configured. Many of them have multiple hosts and paths, so in the nginx.config
, there is a a total of 1434 nginx locations
directives:
$ kubectl -n ingress exec nginx-ingress-controller-7686cb664f-5lnnc -- cat /etc/nginx/nginx.conf | grep " location " | wc -l
1434
If i pick up on of our ingresses, not even a special one, just one of them, edit a label and apply it – i see the usual 7s wait in our dev
cluster (20s for prod
as reported earlier).
$ time kubectl apply -f /tmp/ingress.yaml
ingress.networking.k8s.io/app-api-ingress configured
real 0m6.884s
user 0m0.410s
sys 0m0.156s
Here i go from 2 paths:
$ cat /tmp/ingress.yaml | grep path:
- path: /api/
- path: /api/external
To 21 paths:
$ cat /tmp/ingress-with-many-paths.yaml | grep path:
- path: /api/
- path: /api/external/
- path: /api/internal/
- path: /foo/(ja|nei)(-[^/]+)?/okse1/
- path: /foo/(ja|nei)(-[^/]+)?/okse2/
- path: /foo/(ja|nei)(-[^/]+)?/okse3/
- path: /foo/(ja|nei)(-[^/]+)?/okse/
- path: /foo/(ja|nei)(-[^/]+)?/banan/
- path: /foo/(ja|nei)(-[^/]+)?/sjallabais/
- path: /fisk/api/
- path: /fisk/api/external/
- path: /fisk/api/internal/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/okse1/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/okse2/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/okse3/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/okse/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/banan/
- path: /fisk/foo/(ja|nei)(-[^/]+)?/sjallabais/
- path: /kake/foo/(ja|nei)(-[^/]+)?/okse2/
- path: /kake/foo/(ja|nei)(-[^/]+)?/okse3/
- path: /kake/foo/(ja|nei)(-[^/]+)?/okse/
The validation webhook becomes super slow:
$ time kubectl apply -f /tmp/ingress-with-many-paths.yaml
ingress.networking.k8s.io/app-api-ingress configured
real 0m25.353s
user 0m0.380s
sys 0m0.116s
Adding more paths makes it go over Kubernetes' admission webhook max timeout of 30s and kubectl
fails.
With the object above applied, you can see how other ingress updates to the same nginx-server also is affected, presumably because the validation webhook makes nginx render all ingress objects and validate the full config each time (rightfully so):
$ time kubectl -n infra patch ing <random ingress in cluster> --type='json' -p='[{"op": "add", "path": "/metadata/labels/testing", "value": "just-a-small-change"}]'
ingress.networking.k8s.io/<random ingress in cluster> patched
real 0m27.728s
user 0m0.385s
sys 0m0.051s
In total, this does not generate much more config, it is just a 2% increase in nginx location directives:
$ kubectl -n ingress exec nginx-ingress-controller-7686cb664f-5lnnc -- cat /etc/nginx/nginx.conf | grep " location " | wc -lDefaulted container "nginx-ingress-controller" out of: nginx-ingress-controller, opentelemetry (init), sysctl (init), geoip-database-download (init)
1472
The full ingress manifests, and the diff between them. ingress.yaml.txt ingress-with-many-paths.yaml.txt diff.txt
I can try and steps to reproduce this locally.
Awesome data. Hope you get to the bottom of this.
I've figured out where this comes from!
The added latency on the admission controller webhook is triggered with the combination of the annotation
-
nginx.ingress.kubernetes.io/enable-owasp-core-rules: "true"
, and - multiple paths in one ingress (10+). More paths seem to have an exponential effect on processing duration.
For our production cluster, we run 270 ingresses, and 4 of these had this annotation, and most of them also had a good handful of paths. Removing the annotation, mitigated our problem. I've also reproduce this locally, if you are interested in the config, commands and manifests.
Moving this (and the corresponding modsecurity flag) to the main configmap works fine though, it does not add any additional latency. We did not do this though as we have a WAF infront of our nginx-ingress that does the same job.
enable-modsecurity: "true"
enable-owasp-modsecurity-crs: "true"
Profit. :taco:
Thank you very much for the update. Awesome work. Love this kind of results :-) cheers
If no more an issue, kindly consider closing the issue. Once again, awesome and thank a lot.
I do think I'd classify this as a regression bug; while more config obviously add validation time, the amount of validation time that we've seen here is unreasonably high compared to the amount config that actually gets added. I'm thinking I can try and profile where the time is spent, and maybe that can be used as input for some future optimizations. I'll close this if that does not yield and good results - or if no one else picks this up within reasonable time.
Thanks for your help and guidance, @longwuyuan.
There is painful & intensive work in progress that relates to this kind of circumstances.
One such effort is a PR trying to split the data-plane away from the control-plane. That paradigm shift introduces improved security, performance and hopefully management.
If you provide the data that you hinted, it can only do good for the developer decisions. The action item arising from that data is not someting I can comment on because the complications are just too much to summarise in writing.
Thank you very much for your info
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
See this same issue in our clusters, assume setting these globally is a decent solution?
This is directly related and is only solved by messing with readyness and liveness probes: https://github.com/kubernetes/ingress-nginx/issues/11689