nginx_ingress_controller_orphan_ingress accumulates very many series over time
What happened: See Screenshot:
We've observed prometheus gradually use more and more memory over time - after some inspection we found that nginx_ingress_controller_orphan_ingress exports a really large amount of labels constantly - even for namespaces that don't exist any more for quite a while.
This cluster might be a bit of a special case as it creates/destroys namespaces with a 10-20 ingresses constantly to run tests.
It's easy to see that this adds up and this number does not go down (unless the nginx-pods are killed):
What you expected to happen:
If I understand correctly, labels are usually kept on /metrics, but in this case (and maybe others), it might be worth considering not exporting the metric any more if the ingress has been deleted.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): This is rke2-ingress-nginx as shipped with rke2 v1.25.11+rke2r1.
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-1.6.4-hardened4
Build: git-90e1717ce
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: nginx/1.21.4
-------------------------------------------------------------------------------
I'm not sure if this version is a fork or vendored by Rancher, but glancing at the code it looks like orphans aren't removed in current mainline 1.8.1 too. I didn't test that yet though (sorry).
Kubernetes version (use kubectl version): v1.25.11+rke2r1
Environment: rke2 managed by rancher on vSphere
-
Cloud provider or hardware configuration: vSphere
-
OS (e.g. from /etc/os-release): Ubuntu 22.02
-
Kernel (e.g.
uname -a): Linux rke2-ingress-nginx-controller-f7c4b 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux -
Install tools: rancher v2.7.4, rke2 all defaults, ServiceMonitors enabled via helmChartConfig.
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
-
Basic cluster related info:
kubectl version:Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.11+rke2r1", GitCommit:"8cfcba0b15c343a8dc48567a74c29ec4844e0b9e", GitTreeState:"clean", BuildDate:"2023-06-14T21:31:34Z", GoVersion:"go1.19.10 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"} ```
-
kubectl get nodes -o wide:NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-development-mgmt-a532ef00-447zr Ready control-plane,etcd,master 12d v1.25.11+rke2r1 10.240.180.85 10.240.180.85 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-mgmt-a532ef00-n9rqx Ready control-plane,etcd,master 12d v1.25.11+rke2r1 10.240.180.84 10.240.180.84 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-mgmt-a532ef00-wpxb5 Ready control-plane,etcd,master 12d v1.25.11+rke2r1 10.240.180.83 10.240.180.83 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-kdphr Ready worker 38d v1.25.11+rke2r1 10.240.180.77 10.240.180.77 Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-kfmlp Ready worker 95d v1.25.11+rke2r1 10.240.180.69 10.240.180.69 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-mh9hx Ready worker 25d v1.25.11+rke2r1 10.240.180.81 10.240.180.81 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-t54ww Ready worker 32d v1.25.11+rke2r1 10.240.180.78 10.240.180.78 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-w5xc5 Ready worker 23d v1.25.11+rke2r1 10.240.180.82 10.240.180.82 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 k8s-development-workers-6c24g-a8ecb429-zhm6p Ready worker 95d v1.25.11+rke2r1 10.240.180.70 10.240.180.70 Ubuntu 22.04.1 LTS 5.15.0-76-generic containerd://1.7.1-k3s1 -
How was the ingress-nginx-controller installed:
- If helm was used then please show output of
helm ls -A | grep -i ingressrke2-ingress-nginx kube-system 12 2023-07-24 08:42:36.043793919 +0000 UTC deployed rke2-ingress-nginx-4.5.201 1.6.4 - If helm was used then please show output of
helm -n <ingresscontrollernamepspace> get values <helmreleasename>
- If helm was used then please show output of
controller:
metrics:
enabled: true
serviceMonitor:
enabled: true
global:
clusterCIDR: 10.42.0.0/16
clusterCIDRv4: 10.42.0.0/16
clusterDNS: 10.43.0.10
clusterDomain: cluster.local
rke2DataDir: /var/lib/rancher/rke2
serviceCIDR: 10.43.0.0/16
- Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=rke2-ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=rke2-ingress-nginx
app.kubernetes.io/part-of=rke2-ingress-nginx
app.kubernetes.io/version=1.6.4
helm.sh/chart=rke2-ingress-nginx-4.5.201
Annotations: meta.helm.sh/release-name: rke2-ingress-nginx
meta.helm.sh/release-namespace: kube-system
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n <ingresscontrollernamespace> get all -A -o wide- this is a bit large as this commands will list all cluster resources... (-A)
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>- rke2 installs ingress-nginx as daemonset - so this produces too much output - here's a single one:
Name: rke2-ingress-nginx-controller-tkrnv
Namespace: kube-system
Priority: 0
Service Account: rke2-ingress-nginx
Node: k8s-development-workers-6c24g-a8ecb429-zhm6p/10.240.180.70
Start Time: Mon, 08 May 2023 14:56:51 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=rke2-ingress-nginx
app.kubernetes.io/name=rke2-ingress-nginx
controller-revision-hash=6844f6f4b8
pod-template-generation=5
Annotations: cni.projectcalico.org/containerID: 2ba3ae616e3360a86663711c9a643fe810c02d5ebb92278eea5f146be969974b
cni.projectcalico.org/podIP: 10.42.166.28/32
cni.projectcalico.org/podIPs: 10.42.166.28/32
Status: Running
IP: 10.42.166.28
IPs:
IP: 10.42.166.28
Controlled By: DaemonSet/rke2-ingress-nginx-controller
Containers:
rke2-ingress-nginx-controller:
Container ID: containerd://90ae21ac8a1c87a5387f45fde869bf498af033157df3b3f28c507767ec5cc38b
Image: rancher/nginx-ingress-controller:nginx-1.6.4-hardened4
Image ID: docker.io/rancher/nginx-ingress-controller@sha256:7804101a5cb8de407b1192e42ea0d6153ac2a71eb1765f63ca4af60a1dbe46f3
Ports: 80/TCP, 443/TCP, 10254/TCP, 8443/TCP
Host Ports: 80/TCP, 443/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--election-id=rke2-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/rke2-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--watch-ingress-without-class=true
State: Running
Started: Fri, 30 Jun 2023 10:31:24 +0200
Last State: Terminated
Reason: Unknown
Exit Code: 255
Started: Sat, 17 Jun 2023 09:25:31 +0200
Finished: Fri, 30 Jun 2023 10:30:32 +0200
Ready: True
Restart Count: 5
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: rke2-ingress-nginx-controller-tkrnv (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lhqr4 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: rke2-ingress-nginx-admission
Optional: false
kube-api-access-lhqr4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RELOAD 16m (x3612 over 24d) nginx-ingress-controller NGINX reload triggered due to a change in configuration
-
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename> -
Current state of ingress object, if applicable:
kubectl -n <appnnamespace> get all,ing -o widekubectl -n <appnamespace> describe ing <ingressname>- If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
-
Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
kubectl describe ...of any custom configmap(s) created and in use- Any other related information that may help
- Any other related information like ;
How to reproduce this issue:
Create a namespace with a few ingresses (doesn't matter if orphaned or not) Delete the namespace Observe metrics for ingresses stay in /metrics, and status for orphaned also stays in there
Anything else we need to know: