ingress-nginx
ingress-nginx copied to clipboard
Load balancer IP cleared from all ingresses when upgrading nginx-ingress-controller
What happened:
When updating nginx-ingress-controller Helm chart to a new version (in this case: 4.9.1 to 4.10.0), the current leader pods logs these messages:
I0305 11:08:37.122872 7 nginx.go:379] "Shutting down controller queues"
I0305 11:08:37.137382 7 status.go:135] "removing value from ingress status" address=[{"ip":"10.1.2.3"}]
I0305 11:08:37.146479 7 status.go:304] "updating Ingress status" namespace="service-a-rc" ingress="service-a-rc" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.146522 7 status.go:304] "updating Ingress status" namespace="service-b-rc" ingress="service-b-rc" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.146593 7 status.go:304] "updating Ingress status" namespace="kube-system" ingress="kube-oidc-proxy" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.146648 7 status.go:304] "updating Ingress status" namespace="monitoring" ingress="prometheus-k8s" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.146907 7 status.go:304] "updating Ingress status" namespace="kube-system" ingress="dex" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.146954 7 status.go:304] "updating Ingress status" namespace="monitoring" ingress="thanos-sidecar" currentValue=[{"ip":"10.1.2.3"}] newValue=[]
I0305 11:08:37.960259 7 nginx.go:387] "Stopping admission controller"
E0305 11:08:37.960342 7 nginx.go:326] "Error listening for TLS connections" err="http: Server closed"
I0305 11:08:37.960354 7 nginx.go:395] "Stopping NGINX process"
2024/03/05 11:08:37 [notice] 720#720: signal process started
I0305 11:08:39.013339 7 nginx.go:408] "NGINX process has stopped"
I0305 11:08:39.013371 7 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds
I0305 11:08:49.013693 7 sigterm.go:47] "Exiting" code=0
This led to the load balancer IP (10.1.2.3) being removed from the status of all ingresses managed by the ingress controller, which in turn led to external-dns deleting all the DNS records for these ingresses, which caused an outage.
The newly elected leader from the updated deployment then put the load balancer IP back in the ingress status and external-dns recreated all the records.
This does not happen during normal pod restarts, only during version upgrades (we retroactively found the same logs during our last upgrade from 4.9.0 to 4.9.1).
What you expected to happen:
nginx-ingress-controller should not clear the status of ingresses when it shuts down during a version upgade.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller
Release: v1.10.0
Build: 71f78d49f0a496c31d4c19f095469f3f23900f8a
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.25.3
Kubernetes version (use kubectl version): v1.27.2
Environment:
-
Cloud provider or hardware configuration: doesn't matter
-
OS (e.g. from /etc/os-release): doesn't matter
-
Kernel (e.g.
uname -a): doesn't matter -
Install tools: doesn't matter
-
Basic cluster related info: doesn't matter
-
How was the ingress-nginx-controller installed:
$ helm template --repo https://kubernetes.github.io/ingress-nginx ingress-nginx -f values.yaml | kubectl apply -f-
with values.yaml:
fullnameOverride: nginx-ingress-internal
controller:
kind: Deployment
containerName: nginx-ingress-controller
electionID: nginx-ingress-controller-internal-leader
ingressClass: nginx-internal
ingressClassResource:
controllerValue: k8s.goto.com/nginx-ingress-internal
enabled: true
ingressClassByName: true
name: nginx-internal
admissionWebhooks:
certManager:
enabled: true
allowSnippetAnnotations: false
enableAnnotationValidations: true
config:
strict-validate-path-type: true
use-proxy-protocol: false
dnsPolicy: Default
minReadySeconds: 60
extraArgs:
shutdown-grace-period: 60
metrics:
enabled: true
serviceMonitor:
enabled: true
scrapeInterval: 60s
replicaCount: "2"
resources:
limits:
memory: 2Gi
requests:
cpu: 100m
memory: 512Mi
service:
type: LoadBalancer
externalTrafficPolicy: Local
annotations:
# cloud-specific annotations
# ...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- ingress-nginx
- key: app.kubernetes.io/instance
operator: In
values:
- nginx-ingress-controller-internal
- key: app.kubernetes.io/component
operator: In
values:
- controller
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: nginx-ingress-controller-internal
app.kubernetes.io/name: ingress-nginx
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
How to reproduce this issue:
- Install some version of nginx-ingress-controller with Helm, using a service of type Load Balancer.
- Upgrade or downgrade to a different version.
- Observe the loglines above; potentially run
kubectl get ingress -A -win a background terminal to observe the load balancer IP being removed from all the ingresses manage by the controller.
Anything else we need to know:
The error message seems to be coming from here: https://github.com/kubernetes/ingress-nginx/blob/9c384c7eb85b9833bad491e14fffcfe3b9c59ee5/internal/ingress/status/status.go#L135
This line is normally not reached when isRunningMultiplePods returns true:
https://github.com/kubernetes/ingress-nginx/blob/9c384c7eb85b9833bad491e14fffcfe3b9c59ee5/internal/ingress/status/status.go#L130-L133
Judging by the code of this function: https://github.com/kubernetes/ingress-nginx/blob/9c384c7eb85b9833bad491e14fffcfe3b9c59ee5/internal/ingress/status/status.go#L238-L252
it tries to find pods with the same labels as the currently running pod and only returns true if it finds at least one such pod. During a version upgrade, it is very likely that this return false because the Helm chart adds the version of the chart and the version of the ingress controller as labels to the pods, hence the new pods of the updated deployment are not considered anymore by this function.
@alfredkrohmer is this on a cloud -provider or on baremetal+metallb etc
- The template of a new bug report asks questions that are the small tiny seeming unrelated and seeming irrelevant sometimes to some people
- But that data is more often essential for actionable tasks
- If you can help out and provide all that info, it would help readers
/remove-kind bug
- Lets add the bug label after a successful reproduce //triage needs-information
/triage needs-information
This is on OCI (OKE) with a Network Load Balancer as a service, but as I pointed out with the code references, this seems to be generic behavior that triggers when a new version of the controller is rolled out with a different label set than the old pods.
I saw it happen on minikube with metallb
just like another feature, this also impacts external-dns use-case
/triage accepted
I think it will be interesting to see if it happens when you set the LoadBalancer IP https://github.com/kubernetes/ingress-nginx/blob/9c384c7eb85b9833bad491e14fffcfe3b9c59ee5/charts/ingress-nginx/values.yaml#L469 (if OKE/OCI supports that field)
OKE does support that field, but only for public IPs and unfortunately we don't have test clusters with public subnets available 🙁
Not sure what the right approach to fix this would be. I can think of a couple of options:
- exclude more label keys here, i.e. the labels containing the app and Helm chart versions
- this would bleed Helm chart specifics into the code base
- if ever a label would be removed from Helm chart later on, this would either break again during the upgrade or the removed label would need to be added to the code base first (which sounds wrong)
- make the list of excluded label keys configurable
- the Helm chart would need to supply this list
- provide a configuration flag to skip clearing the ingress status altogether
- if enabled by default in the Helm chart, uninstalling it wouldn't clear the status
- if disabled by default in the Helm chart, this would most likely still cause outages for most users on version upgrades if they don't explicitly enable it
Update: We realised we inject ec2-instance-id labels to all pods in our cluster, and so the controller pods always think they are the last one, and so runs the cleanup. So our issue is unrelated to specifically controller version update.
I can add +1 to this, we see this often, even when not updating, e.g.
- liveness probe fail, pods are terminated and restarted.
- node is broken or scaled down - needing to reschedule pods to different node, again triggers terminating
- updating deployment (e.g. some config or our custom cluster labels updates)
We see the same symptoms everytime the controller pod is terminated:
- endpoint is cleaned up on the ingress resource
- external-dns removes the dns record
- new ingress-controller pod starts
- updates the endpoint on the ingress resources
- external-dns readds the dns record
We run 2 replicas of the nginx-ingress-controller. We don't use Helm, we use the direct k8s manifest import.
I would expect that if one replica is terminated, as long as there is another replica that becomes leader, it should not update the statuses?
~Update: we also see some errors around leader election during the shutting down, similar to https://github.com/kubernetes/ingress-nginx/issues/8996 - so it may be compounded issue.~
Update: the leader election errors seem to be irrelevent.
If the DNS is queried at the right time (when no record exists), the negative cache TTL might extend the issue beyond the actual recovery, which makes this worse I think.
cc @strongjz @rikatz @tao12345666333
Any idea / opinion on how to fix this?
@alfredkrohmer any chance you can test setting the LoadBalancerIP in values file and update with behaviour https://github.com/kubernetes/ingress-nginx/blob/9c384c7eb85b9833bad491e14fffcfe3b9c59ee5/charts/ingress-nginx/values.yaml#L469
This doesn't really work for us as we don't pick the load balancer IP statically. It's coming from the status of the service for which a backing load balancer in the cloud will be created.
/assign
/kind feature /priority important-longterm
Any new regarding this issue ?
Oh wow, not sure why I haven't seen this before, but there is actually a flag to not remove the ingress status on shutdown :facepalm:
https://github.com/kubernetes/ingress-nginx/blob/886956e5728c318c3fa48792a08b940b7cdba4ff/pkg/flags/flags.go#L138
https://github.com/kubernetes/ingress-nginx/blob/886956e5728c318c3fa48792a08b940b7cdba4ff/internal/ingress/status/status.go#L113
Now I wonder if this should be set to false by default in the Helm chart.