microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

calico and metallb controllers getting CrashLoopBackOff error

Open FATIHISILGAN opened this issue 4 months ago • 2 comments

I am getting this status in microk8s.

calico-kube-controllers-5947598c79-z8zcd 0/1 CrashLoopBackOff

Calico version -> docker.io/calico/node:v3.28.1 Microk8s version -> MicroK8s v1.32.3 revision 8148 Ubuntu version -> Ubuntu 24.04.2 LTS


Problem Description

More than one pod controller is receiving CrashLoopBackOff.

Pods:

microk8s kubectl get pods -A 
kube-system      calico-kube-controllers-5947598c79-f99vb   0/1     CrashLoopBackOff             32 (2m36s ago)   102m
kube-system      calico-node-2cp8j                          1/1     Running                      1                92m
metallb-system   controller-7ffc454778-65llg                0/1     CrashLoopBackOff             23 (35s ago)     47m
metallb-system   speaker-llgsh                              0/1     CreateContainerConfigError   0                47m

Description check

microk8s kubectl describe pods -A | grep -A 20 -B 5 "Error\|Failed\|CrashLoopBackOff"
   Image:          docker.io/calico/kube-controllers:v3.28.1
    Image ID:       docker.io/calico/kube-controllers@sha256:eadb3a25109a5371b7b2dc30d74b2d9b2083eba58abdc034c81f974a48871330
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 25 Aug 2025 11:52:43 +0300
      Finished:     Mon, 25 Aug 2025 11:53:43 +0300
    Ready:          False
    Restart Count:  40
    Liveness:       exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xqlpn (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
--
    Host Ports:    0/TCP, 0/TCP
    Args:
      --port=7472
      --log-level=info
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 25 Aug 2025 11:52:44 +0300
      Finished:     Mon, 25 Aug 2025 11:53:14 +0300
    Ready:          False
    Restart Count:  29
    Liveness:       http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      METALLB_ML_SECRET_NAME:  memberlist
      METALLB_DEPLOYMENT:      controller
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dxbg7 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
--
    Host Ports:    7472/TCP, 7946/TCP, 7946/UDP
    Args:
      --port=7472
      --log-level=info
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      METALLB_NODE_NAME:       (v1:spec.nodeName)
      METALLB_HOST:            (v1:status.hostIP)
      METALLB_ML_BIND_ADDR:    (v1:status.podIP)
      METALLB_ML_LABELS:      app=metallb,component=speaker
      METALLB_ML_SECRET_KEY:  <set to the key 'secretkey' in secret 'memberlist'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8mc8z (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
--
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason  Age                   From     Message
  ----     ------  ----                  ----     -------
  Normal   Pulled  111s (x231 over 51m)  kubelet  Container image "quay.io/metallb/speaker:v0.13.3" already present on machine
  Warning  Failed  111s (x231 over 51m)  kubelet  Error: secret "memberlist" not found

Busybox check

microk8s` kubectl run -it --rm busybox --image=busybox:1.36 --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # 
/ # 
/ # wget -qO- https://10.152.183.1:443 --no-check-certificate
wget: can't connect to remote host (10.152.183.1): Connection timed out

Logs calico controllers

microk8s kubectl logs calico-kube-controllers-5947598c79-f99vb -n kube-system

2025-08-25 09:18:13.176 [INFO][1] main.go 99: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2025-08-25 09:18:13.178 [WARNING][1] winutils.go 150: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2025-08-25 09:18:13.179 [INFO][1] main.go 123: Ensuring Calico datastore is initialized
2025-08-25 09:18:43.182 [ERROR][1] client.go 287: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:18:43.182 [INFO][1] main.go 130: Failed to initialize datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [ERROR][1] client.go 287: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [INFO][1] main.go 130: Failed to initialize datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [FATAL][1] main.go 143: Failed to initialize Calico datastore

Logs metallb controller

microk8s kubectl logs controller-7ffc454778-65llg -n metallb-system

{"branch":"dev","caller":"main.go:141","commit":"dev","goversion":"gc / go1.18.3 / amd64","level":"info","msg":"MetalLB controller starting version 0.13.3 (commit dev, branch dev)","ts":"2025-08-25T09:26:44Z","version":"0.13.3"}

FATIHISILGAN avatar Aug 25 '25 10:08 FATIHISILGAN

Hey @FATIHISILGAN,

Can you share an inspection report for your node? On a first look it seems like the kube-apiserver is struggling / not healthy. We need more logs to have an understanding the root cause.

berkayoz avatar Aug 26 '25 06:08 berkayoz

Sure, @berkayoz This is my microk8s inspect report;

inspection-report-new.zip

FATIHISILGAN avatar Aug 26 '25 12:08 FATIHISILGAN