kubernetes-ingress
kubernetes-ingress copied to clipboard
Ingress Pods Fail to Start
I'm experiencing a very strange behavior where sometimes an HAProxy Kubernetes Ingress pod fails to start and begins to crash loop. The initial kubectl describe output seems to point to an issue with the startup probe failing. The issue seems to be sporadic and requires a "fresh" new node that has never had a HAProxy Ingress Controller pod on it before to resolve the issue.
(Initial describe output)
Containers:
kubernetes-ingress-controller:
Container ID: containerd://75f1a475d94cfe97016a17ba8788eae098ec5c55ccf2b13cc98448d9dfe646bc
Image: haproxytech/kubernetes-ingress:1.11.4
Image ID: docker.io/haproxytech/kubernetes-ingress@sha256:c5f8a41ef0d4b177bec10f082da578f2be69af9a54b719a76ea6ce2707f4248e
Ports: 8080/TCP, 8443/TCP, 1024/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
--default-ssl-certificate=kube-system/haproxy-dev-kubernetes-ingress-default-cert
--configmap=kube-system/haproxy-dev-kubernetes-ingress
--http-bind-port=8080
--https-bind-port=8443
--ingress.class=haproxy-dev
--publish-service=kube-system/haproxy-dev-kubernetes-ingress
--log=debug
--prometheus
State: Running
Started: Tue, 04 Jun 2024 14:10:40 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 250m
memory: 400Mi
Liveness: http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:1042/healthz delay=0s timeout=1s period=1s #success=1 #failure=20
Environment:
POD_NAME: haproxy-dev-kubernetes-ingress-6c7f954ccb-fxrqg (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
POD_IP: (v1:status.podIP)
Mounts:
/run from tmp (rw,path="run")
/tmp from tmp (rw,path="tmp")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6qrp8 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 64Mi
kube-api-access-6qrp8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m40s default-scheduler Successfully assigned kube-system/haproxy-dev-kubernetes-ingress-6c7f954ccb-fxrqg to ip-10-11-52-55.ec2.internal
Normal Pulled 3m39s kubelet Container image "haproxytech/kubernetes-ingress:1.11.4" already present on machine
Normal Created 3m39s kubelet Created container kubernetes-ingress-controller
Normal Started 3m39s kubelet Started container kubernetes-ingress-controller
Warning Unhealthy 3m32s (x7 over 3m38s) kubelet Startup probe failed: Get "http://10.11.42.62:1042/healthz": dial tcp 10.11.42.62:1042: connect: connection refused
But when I adjust the startup probe configuration to allow for more startup time, the pods still continue to crash immediately but with a 137 exit code and s6-overlay error logs.
(Altered starup probe configuration)
startupProbe:
failureThreshold: 20
httpGet:
path: /healthz
port: 1042
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
(kubectl describe output from after probe update)
Containers:
kubernetes-ingress-controller:
Container ID: containerd://7bd89b7cd7c297e261b64e69e8fb1bb1cb3f0c4fbd609775c47336d54fea867e
Image: haproxytech/kubernetes-ingress:1.11.4
Image ID: docker.io/haproxytech/kubernetes-ingress@sha256:c5f8a41ef0d4b177bec10f082da578f2be69af9a54b719a76ea6ce2707f4248e
Ports: 8080/TCP, 8443/TCP, 1024/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
--default-ssl-certificate=kube-system/haproxy-dev-kubernetes-ingress-default-cert
--configmap=kube-system/haproxy-dev-kubernetes-ingress
--http-bind-port=8080
--https-bind-port=8443
--ingress.class=haproxy-dev
--publish-service=kube-system/haproxy-dev-kubernetes-ingress
--log=debug
--prometheus
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 04 Jun 2024 14:16:12 -0400
Finished: Tue, 04 Jun 2024 14:16:12 -0400
Ready: False
Restart Count: 1
Requests:
cpu: 250m
memory: 400Mi
Liveness: http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:1042/healthz delay=10s timeout=1s period=10s #success=1 #failure=20
Environment:
POD_NAME: haproxy-dev-kubernetes-ingress-584698c8df-g6snl (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
POD_IP: (v1:status.podIP)
Mounts:
/run from tmp (rw,path="run")
/tmp from tmp (rw,path="tmp")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dqsv4 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 64Mi
kube-api-access-dqsv4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20s default-scheduler Successfully assigned kube-system/haproxy-dev-kubernetes-ingress-584698c8df-g6snl to ip-10-11-4-1.ec2.internal
Normal Pulled 18s (x2 over 19s) kubelet Container image "haproxytech/kubernetes-ingress:1.11.4" already present on machine
Normal Created 18s (x2 over 19s) kubelet Created container kubernetes-ingress-controller
Normal Started 18s (x2 over 19s) kubelet Started container kubernetes-ingress-controller
Warning BackOff 8s (x4 over 17s) kubelet Back-off restarting failed container kubernetes-ingress-controller in pod haproxy-dev-kubernetes-ingress-584698c8df-g6snl_kube-system(804c7725-4aea-48fc-a903-86557dd4304b)
(container logs)
s6-overlay-suexec: warning: unable to gain root privileges (is the suid bit set?)
/package/admin/s6-overlay/libexec/preinit: info: read-only root
/package/admin/s6-overlay/libexec/preinit: info: writable /run. Checking for executability.
I've tried different versions of the HAProxy Ingress controller, updating Kubernetes versions, updating node AMIs, altering resource allocations (trying to address the 137
exit code), removing security contexts, and more with no luck. Oddly, I'm only having this issue on one EKS clusters I'm running. The exact same installation works on a different EKS cluster running the same version and configuration.
Specs
- HAProxy Version: haproxytech/kubernetes-ingress:1.11.4
- Kubernetes Version: EKS/1.30