kubernetes-ingress Ingress Pods Fail to Start

Ingress Pods Fail to Start

Open briananstett opened this issue 8 months ago • 2 comments

I'm experiencing a very strange behavior where sometimes an HAProxy Kubernetes Ingress pod fails to start and begins to crash loop. The initial kubectl describe output seems to point to an issue with the startup probe failing. The issue seems to be sporadic and requires a "fresh" new node that has never had a HAProxy Ingress Controller pod on it before to resolve the issue.

(Initial describe output)

Containers:
  kubernetes-ingress-controller:
    Container ID:  containerd://75f1a475d94cfe97016a17ba8788eae098ec5c55ccf2b13cc98448d9dfe646bc
    Image:         haproxytech/kubernetes-ingress:1.11.4
    Image ID:      docker.io/haproxytech/kubernetes-ingress@sha256:c5f8a41ef0d4b177bec10f082da578f2be69af9a54b719a76ea6ce2707f4248e
    Ports:         8080/TCP, 8443/TCP, 1024/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      --default-ssl-certificate=kube-system/haproxy-dev-kubernetes-ingress-default-cert
      --configmap=kube-system/haproxy-dev-kubernetes-ingress
      --http-bind-port=8080
      --https-bind-port=8443
      --ingress.class=haproxy-dev
      --publish-service=kube-system/haproxy-dev-kubernetes-ingress
      --log=debug
      --prometheus
    State:          Running
      Started:      Tue, 04 Jun 2024 14:10:40 -0400
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      250m
      memory:   400Mi
    Liveness:   http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:1042/healthz delay=0s timeout=1s period=1s #success=1 #failure=20
    Environment:
      POD_NAME:       haproxy-dev-kubernetes-ingress-6c7f954ccb-fxrqg (v1:metadata.name)
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
      POD_IP:          (v1:status.podIP)
    Mounts:
      /run from tmp (rw,path="run")
      /tmp from tmp (rw,path="tmp")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6qrp8 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  64Mi
  kube-api-access-6qrp8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m40s                  default-scheduler  Successfully assigned kube-system/haproxy-dev-kubernetes-ingress-6c7f954ccb-fxrqg to ip-10-11-52-55.ec2.internal
  Normal   Pulled     3m39s                  kubelet            Container image "haproxytech/kubernetes-ingress:1.11.4" already present on machine
  Normal   Created    3m39s                  kubelet            Created container kubernetes-ingress-controller
  Normal   Started    3m39s                  kubelet            Started container kubernetes-ingress-controller
  Warning  Unhealthy  3m32s (x7 over 3m38s)  kubelet            Startup probe failed: Get "http://10.11.42.62:1042/healthz": dial tcp 10.11.42.62:1042: connect: connection refused

But when I adjust the startup probe configuration to allow for more startup time, the pods still continue to crash immediately but with a 137 exit code and s6-overlay error logs.

(Altered starup probe configuration)

startupProbe:
  failureThreshold: 20
  httpGet:
    path: /healthz
    port: 1042
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1

(kubectl describe output from after probe update)

Containers:
  kubernetes-ingress-controller:
    Container ID:  containerd://7bd89b7cd7c297e261b64e69e8fb1bb1cb3f0c4fbd609775c47336d54fea867e
    Image:         haproxytech/kubernetes-ingress:1.11.4
    Image ID:      docker.io/haproxytech/kubernetes-ingress@sha256:c5f8a41ef0d4b177bec10f082da578f2be69af9a54b719a76ea6ce2707f4248e
    Ports:         8080/TCP, 8443/TCP, 1024/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      --default-ssl-certificate=kube-system/haproxy-dev-kubernetes-ingress-default-cert
      --configmap=kube-system/haproxy-dev-kubernetes-ingress
      --http-bind-port=8080
      --https-bind-port=8443
      --ingress.class=haproxy-dev
      --publish-service=kube-system/haproxy-dev-kubernetes-ingress
      --log=debug
      --prometheus
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 04 Jun 2024 14:16:12 -0400
      Finished:     Tue, 04 Jun 2024 14:16:12 -0400
    Ready:          False
    Restart Count:  1
    Requests:
      cpu:      250m
      memory:   400Mi
    Liveness:   http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1042/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:1042/healthz delay=10s timeout=1s period=10s #success=1 #failure=20
    Environment:
      POD_NAME:       haproxy-dev-kubernetes-ingress-584698c8df-g6snl (v1:metadata.name)
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
      POD_IP:          (v1:status.podIP)
    Mounts:
      /run from tmp (rw,path="run")
      /tmp from tmp (rw,path="tmp")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dqsv4 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  64Mi
  kube-api-access-dqsv4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  20s                default-scheduler  Successfully assigned kube-system/haproxy-dev-kubernetes-ingress-584698c8df-g6snl to ip-10-11-4-1.ec2.internal
  Normal   Pulled     18s (x2 over 19s)  kubelet            Container image "haproxytech/kubernetes-ingress:1.11.4" already present on machine
  Normal   Created    18s (x2 over 19s)  kubelet            Created container kubernetes-ingress-controller
  Normal   Started    18s (x2 over 19s)  kubelet            Started container kubernetes-ingress-controller
  Warning  BackOff    8s (x4 over 17s)   kubelet            Back-off restarting failed container kubernetes-ingress-controller in pod haproxy-dev-kubernetes-ingress-584698c8df-g6snl_kube-system(804c7725-4aea-48fc-a903-86557dd4304b)

(container logs)

s6-overlay-suexec: warning: unable to gain root privileges (is the suid bit set?)
/package/admin/s6-overlay/libexec/preinit: info: read-only root
/package/admin/s6-overlay/libexec/preinit: info: writable /run. Checking for executability.

I've tried different versions of the HAProxy Ingress controller, updating Kubernetes versions, updating node AMIs, altering resource allocations (trying to address the 137 exit code), removing security contexts, and more with no luck. Oddly, I'm only having this issue on one EKS clusters I'm running. The exact same installation works on a different EKS cluster running the same version and configuration.

Specs

HAProxy Version: haproxytech/kubernetes-ingress:1.11.4
Kubernetes Version: EKS/1.30

Jun 04 '24 18:06 briananstett

kubernetes-ingress kubernetes-ingress copied to clipboard

Ingress Pods Fail to Start

Specs

kubernetes-ingress
kubernetes-ingress copied to clipboard