ingress-nginx
ingress-nginx copied to clipboard
v4.11.1 unexpected error obtaining nginx status info
Seeing issues in nginx startup, not seeing much in relation to why there's issues with the healthcheck response.
I0727 00:08:28.342380 7 nginx.go:317] "Starting NGINX process"
I0727 00:08:28.342455 7 leaderelection.go:250] attempting to acquire leader lease ingress-nginx/ingress-nginx-internal-leader...
I0727 00:08:28.342749 7 nginx.go:337] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0727 00:08:28.345201 7 controller.go:193] "Configuration changes detected, backend reload required"
I0727 00:08:28.358021 7 status.go:85] "New leader elected" identity="ingress-nginx-internal-controller-67bfb7fd4b-nzkdt"
2024/07/27 00:08:35 Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:08:35.677958 7 nginx_status.go:171] unexpected error obtaining nginx status info: Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
2024/07/27 00:09:05 Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:05.683341 7 nginx_status.go:171] unexpected error obtaining nginx status info: Get "http://127.0.0.1:10246/nginx_status": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:07.380630 7 controller.go:213] "Backend successfully reloaded"
I0727 00:09:07.380716 7 controller.go:224] "Initial sync, sleeping for 1 second"
I0727 00:09:07.380802 7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-internal-controller-dbcc4dc9c-29mpv", UID:"4ee6bf1d-df1f-4bb4-8e37-04d6978dfd6d", APIVersion:"v1", ResourceVersion:"214163955", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0727 00:09:08.382382 7 controller.go:244] Dynamic reconfiguration failed (retrying; 15 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:09.394353 7 controller.go:244] Dynamic reconfiguration failed (retrying; 14 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:10.797697 7 controller.go:244] Dynamic reconfiguration failed (retrying; 13 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:12.616922 7 controller.go:244] Dynamic reconfiguration failed (retrying; 12 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:14.913299 7 controller.go:244] Dynamic reconfiguration failed (retrying; 11 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:16.276657 7 sigterm.go:36] "Received SIGTERM, shutting down"
I0727 00:09:16.276928 7 nginx.go:393] "Shutting down controller queues"
I0727 00:09:16.289355 7 nginx.go:401] "Stopping admission controller"
E0727 00:09:16.289652 7 nginx.go:340] "Error listening for TLS connections" err="http: Server closed"
I0727 00:09:16.289815 7 nginx.go:409] "Stopping NGINX process"
W0727 00:09:17.931239 7 controller.go:244] Dynamic reconfiguration failed (retrying; 10 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:21.837363 7 controller.go:244] Dynamic reconfiguration failed (retrying; 9 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:26.847362 7 controller.go:244] Dynamic reconfiguration failed (retrying; 8 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:33.648965 7 controller.go:244] Dynamic reconfiguration failed (retrying; 7 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
2024/07/27 00:09:16 [notice] 2486#2486: ModSecurity-nginx v1.0.3 (rules loaded inline/local/remote: 0/14418/0)
2024/07/27 00:09:16 [notice] 2486#2486: signal process started
W0727 00:09:41.869474 7 controller.go:244] Dynamic reconfiguration failed (retrying; 6 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
W0727 00:09:53.470106 7 controller.go:244] Dynamic reconfiguration failed (retrying; 5 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused
I0727 00:09:59.244212 7 nginx.go:422] "NGINX process has stopped"
I0727 00:09:59.244234 7 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds
What happened:
Upgraded my helm chart from v4.10.0 to v4.11.1
What you expected to happen:
All pods are replaced and working without issue.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.11.1
Build: 7c44f992012555ff7f4e47c08d7c542ca9b4b1f7
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.5
Kubernetes version (use kubectl version):
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b
Environment: AWS EKS
- How was the ingress-nginx-controller installed:
values: |
fullnameOverride: ingress-nginx-internal
controller:
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
resources:
requests:
cpu: "500m"
memory: "512Mi"
ingressClassResource:
name: "nginx-internal"
controllerValue: "k8s.io/ingress-nginx-internal"
enabled: true
default: true
opentelemetry:
enabled: true
admissionWebhooks:
timeoutSeconds: 30
config:
allow-snippet-annotations: "true"
otlp-collector-host: "opentelemetry-collector.monitoring.svc"
otlp-collector-port: "4317"
enable-opentelemetry: "true"
otel-sampler: "AlwaysOn"
otel-sampler-ratio: "1.0"
enable-underscores-in-headers: "true"
opentelemetry-config: "/etc/nginx/opentelemetry.toml"
opentelemetry-operation-name: "HTTP $request_method $service_name $uri"
opentelemetry-trust-incoming-span: "false"
otel-sampler-parent-based: "false"
otel-max-queuesize: "2048"
otel-schedule-delay-millis: "5000"
otel-max-export-batch-size: "512"
server-snippet: |
opentelemetry_attribute "ingress.namespace" "$namespace";
opentelemetry_attribute "ingress.service_name" "$service_name";
opentelemetry_attribute "ingress.name" "$ingress_name";
opentelemetry_attribute "ingress.upstream" "$proxy_upstream_name";
metrics:
enabled: true
serviceMonitor:
enabled: true
service:
public: false
subdomain: "ingress-internal"
external:
enabled: false
internal:
enabled: true
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-attributes: deletion_protection.enabled=true
- Current State of the controller:
Name: nginx-internal
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx-internal
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.11.1
argocd.argoproj.io/instance=ingress-nginx-internal
helm.sh/chart=ingress-nginx-4.11.1
Annotations: argocd.argoproj.io/tracking-id: ingress-nginx-internal:networking.k8s.io/IngressClass:ingress-nginx/nginx-internal
ingressclass.kubernetes.io/is-default-class: true
Controller: k8s.io/ingress-nginx-internal
Events: <none>
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/remove-kind bug /kind support
Please try to add the AWS documented annotation related to security-groups. It could be that you are blocking required ports so check if the required ports are opened inside the cluster (look at the multiple port fields inside the pod for port numbers).
You have not answered any questions asked in the template of a new issue so there is nothing to debug and analyze here. Answer the questions asked in the new issue template to help out.
/triage needs-information
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.
We are currently experiencing the same issue after upgrading from 4.10 to 4.11.3. What kind of information could we provide to help debug this @longwuyuan? We have rolled back to 4.10.5 for the time being.
We are on AWS, using EKS (same as OP), running Kubernetes version 1.30.4 on most of our worker nodes (in contrast to OP's 1.29).
We have a staging cluster where we can reproduce the issue so I can provide any information that might be useful without impacting our day-to-day operations.
I updated the docs with some AWS related annotations, specific to healthcheck https://kubernetes.github.io/ingress-nginx/deploy/ . @naanselmo , you can see if it relates.
From history, one data is clear. The error message in the controller log is really precise in indicating the root-cause. Interpreting that root-cause as a blocked port or a temporary failure to establish connection for healthcheck, is based on the data from the cluster user.
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.
Seeing similar issue, investigating right now but it seems to be linked to having nginx.ingress.kubernetes.io/enable-modsecurity: "true" in many ingresses: https://github.com/kubernetes/ingress-nginx/issues/12927