Calico-kube-controllers CrashLoopBackOff
Calico-kube-controllers Crash
Environment
- Calico v3.24.5
- Kubernetes 1.25.2
- CentOs7.7
Problem Description
This is my first deployment of calico, using the https://docs.projectcalico.org/manifests/calico.yaml , Key Yaml Config:

When the deployment is completed, the calico-kube-controllers pod is displayed as Crash Status。

kubectl describe po calico-kube-controllers-767f55c9d8-ddj6n -n kube-system

Then I checked the calico-kube-controllers-767f55c9d8-ddj6n pod log
2022-12-21 15:14:41.721 [INFO][1] watchersyncer.go 130: Sending status update Status=in-sync
2022-12-21 15:14:41.721 [INFO][1] syncer.go 86: Node controller syncer status updated: in-sync
** 2022-12-21 15:14:41.722 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied **
2022-12-21 15:14:41.722 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2022-12-21 15:14:41.730 [INFO][1] hostendpoints.go 173: successfully synced all hostendpoints
I1221 15:14:41.818558 1 shared_informer.go:262] Caches are synced for nodes
I1221 15:14:41.818607 1 shared_informer.go:255] Waiting for caches to sync for pods
I1221 15:14:41.818630 1 shared_informer.go:262] Caches are synced for pods
2022-12-21 15:14:41.818 [INFO][1] ipam.go 253: Will run periodic IPAM sync every 7m30s
2022-12-21 15:14:41.819 [INFO][1] ipam.go 331: Syncer is InSync, kicking sync channel status=in-sync
Check the node status with the calicoctl:

The node status should be normal, but calico-kube-controllers pod readinessProbe failure . I don't know what the problem is. Can someone help me ?
Looks like a bug was somehow introduced into our status reporting logic - as a workaround, you can remove the readinessprobe from calico-kube-controllers, but obviously that's not a long term solution. We'll need to figure out why that location isn't writeable.
The offending log is:
Failed to write readiness file: open /status/status.json: permission denied
For the future, it's better to use plaintext for issues rather than images - it makes them more easily searchable.
https://github.com/projectcalico/calico/blob/4c638885a88b2b598382b9d757b99ef90a135db2/kube-controllers/pkg/status/status.go#L136
writerFile permission is 0644, It seems right
You can delete the pod "calico-kube-controllers",after regenerating the pod, it will return to normal.
@xiaomizhg did you make any progress on this one?
You can delete the pod "calico-kube-controllers",after regenerating the pod, it will return to normal.
@pullusers NO, it can't work
Same issue in k8s 1.23.6 rancher/mirrored-calico-kube-controllers:v3.22.0
2023-01-20 07:07:09.901 [INFO][1] main.go 97: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0120 07:07:09.903717 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2023-01-20 07:07:09.904 [INFO][1] main.go 118: Ensuring Calico datastore is initialized
2023-01-20 07:07:09.926 [INFO][1] main.go 159: Getting initial config snapshot from datastore
2023-01-20 07:07:09.943 [INFO][1] main.go 162: Got initial config snapshot
2023-01-20 07:07:09.944 [INFO][1] watchersyncer.go 89: Start called
2023-01-20 07:07:09.944 [INFO][1] main.go 179: Starting status report routine
2023-01-20 07:07:09.944 [INFO][1] main.go 188: Starting Prometheus metrics server on port 9094
2023-01-20 07:07:09.944 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2023-01-20 07:07:09.944 [INFO][1] main.go 463: Starting informer informer=&cache.sharedIndexInformer{indexer:(*cache.cache)(0xc0009e3cb0), controller:cache.Controller(nil), processor:(*cache.sharedProcessor)(0xc000172460), cacheMutationDetector:cache.dummyMutationDetector{}, listerWatcher:(*cache.ListWatch)(0xc0009e3c98), objectType:(*v1.Pod)(0xc000101000), resyncCheckPeriod:0, defaultEventHandlerResyncPeriod:0, clock:(*clock.RealClock)(0x2e2b5b0), started:false, stopped:false, startedLock:sync.Mutex{state:0, sema:0x0}, blockDeltas:sync.Mutex{state:0, sema:0x0}, watchErrorHandler:(cache.WatchErrorHandler)(nil)}
2023-01-20 07:07:09.944 [INFO][1] main.go 463: Starting informer informer=&cache.sharedIndexInformer{indexer:(*cache.cache)(0xc0009e3cf8), controller:cache.Controller(nil), processor:(*cache.sharedProcessor)(0xc0001724d0), cacheMutationDetector:cache.dummyMutationDetector{}, listerWatcher:(*cache.ListWatch)(0xc0009e3ce0), objectType:(*v1.Node)(0xc00031cc00), resyncCheckPeriod:0, defaultEventHandlerResyncPeriod:0, clock:(*clock.RealClock)(0x2e2b5b0), started:false, stopped:false, startedLock:sync.Mutex{state:0, sema:0x0}, blockDeltas:sync.Mutex{state:0, sema:0x0}, watchErrorHandler:(cache.WatchErrorHandler)(nil)}
2023-01-20 07:07:09.944 [INFO][1] syncer.go 78: Node controller syncer status updated: wait-for-ready
2023-01-20 07:07:09.944 [INFO][1] main.go 469: Starting controller ControllerType="Node"
2023-01-20 07:07:09.944 [INFO][1] controller.go 189: Starting Node controller
2023-01-20 07:07:09.944 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2023-01-20 07:07:09.944 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2023-01-20 07:07:09.944 [INFO][1] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-01-20 07:07:09.944 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-01-20 07:07:09.944 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-01-20 07:07:09.944 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-01-20 07:07:09.944 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-01-20 07:07:09.944 [INFO][1] resources.go 350: Main client watcher loop
2023-01-20 07:07:09.946 [INFO][1] watchercache.go 273: Sending synced update ListRoot="/calico/ipam/v2/assignment/"
2023-01-20 07:07:09.947 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-01-20 07:07:09.947 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2023-01-20 07:07:09.947 [INFO][1] syncer.go 78: Node controller syncer status updated: resync
2023-01-20 07:07:09.947 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-01-20 07:07:09.947 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-01-20 07:07:09.949 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-01-20 07:07:09.949 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-01-20 07:07:09.949 [INFO][1] watchercache.go 273: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-01-20 07:07:09.949 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-01-20 07:07:09.949 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2023-01-20 07:07:09.949 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2023-01-20 07:07:09.949 [INFO][1] syncer.go 78: Node controller syncer status updated: in-sync
2023-01-20 07:07:09.958 [INFO][1] hostendpoints.go 177: successfully synced all hostendpoints
2023-01-20 07:07:10.044 [INFO][1] ipam.go 202: Will run periodic IPAM sync every 7m30s
2023-01-20 07:07:10.044 [INFO][1] ipam.go 280: Syncer is InSync, kicking sync channel status=in-sync
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-fc7fcb565-llgwd 0/1 Running 424 (5m55s ago) 25h
canal-ccjxw 2/2 Running 0 25h
coredns-58d67995c7-2kt8t 1/1 Running 0 25h
coredns-autoscaler-7d5478875b-6r2b2 1/1 Running 0 25h
kube-eventer-66fcb6c6c6-hsv6n 1/1 Running 0 25h
kube-flannel-5pcd2 2/2 Running 0 25h
metrics-server-5c4895ffbd-kggmc 1/1 Running 0 25h
You can delete the pod "calico-kube-controllers",after regenerating the pod, it will return to normal.
This works for me.
I had the same problem, deleting the pod and having k8s recreate it did not fix it). It turns out that the calico-kube-controller must run with some sort of privilege to write /status/status.json. My installation (rancher 2.7.3, k8s 1.25.7 on rocky 9, with calico-kube-controller v3.25.0) uses gatekeeper 'assign' resources as part of a mutating webhook that sets things like runAsUser: 65534, runAsGroup: 65534, privileged: false, allowprivilegeEscalation: false and others in a securityContext if they do not exist in a podSpec. If I exempt the calico-system namespace from said mutating webhook, The calico-kube-controller can write that file just fine. I have not had to exempt the k8s namespaces such as kube-system from this mutating webhook. That suggests the stuff running there explicity sets these things (but I have not looked in detail). It would be nice, especially for system-level stuff such as calico, if it would explicity set via a securityContext those privileges it needs to run. https://github.com/projectcalico/calico/blob/master/charts/calico/templates/calico-kube-controllers.yaml does not have any SecurityContext set. If it did, I would have caught what I needed to change alot sooner. With the advent of Pod Security Standards, along with kyverno and gatekeeper, it would be good for system level stuff such as calico to explicitly set privileges it needs to make integration easier as opposed to relying on defaults
I got the same issue, any update for it?
i got the same issue Calico v3.25.1 Kubernetes 1.26.5 Ubuntu 22.04 It was working fine. Suddenly it failed. This seems to be a bug.
"calico-kube-controller crashloopbackoff -> status.go 137 failed to write status error=open /status/status.json: permission denied (readiness liveness)"
version v3.25.1 -> v3.25.2 image pull -> 1/1 running
Anyone know what part was modified?
We updated how calico-kube-controllers permissions are handled in Calico v3.26.0 - please try updating to a newer version to see if that resolves the issue. If not, let's open another issue to track as the versions of Calico used here are quite old and there are a few different descriptions of the problem.
I am also getting a similar error containing this status in microk8s.
calico-kube-controllers-5947598c79-z8zcd 0/1 CrashLoopBackOff
Calico version -> docker.io/calico/node:v3.28.1 Microk8s version -> MicroK8s v1.32.3 revision 8148 Ubuntu version -> Ubuntu 24.04.2 LTS
Problem Description
More than one pod controller is receiving CrashLoopBackOff.
Pods:
microk8s kubectl get pods -A
kube-system calico-kube-controllers-5947598c79-f99vb 0/1 CrashLoopBackOff 32 (2m36s ago) 102m
kube-system calico-node-2cp8j 1/1 Running 1 92m
metallb-system controller-7ffc454778-65llg 0/1 CrashLoopBackOff 23 (35s ago) 47m
metallb-system speaker-llgsh 0/1 CreateContainerConfigError 0 47m
Description check
microk8s kubectl describe pods -A | grep -A 20 -B 5 "Error\|Failed\|CrashLoopBackOff"
Image: docker.io/calico/kube-controllers:v3.28.1
Image ID: docker.io/calico/kube-controllers@sha256:eadb3a25109a5371b7b2dc30d74b2d9b2083eba58abdc034c81f974a48871330
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 25 Aug 2025 11:52:43 +0300
Finished: Mon, 25 Aug 2025 11:53:43 +0300
Ready: False
Restart Count: 40
Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xqlpn (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
--
Host Ports: 0/TCP, 0/TCP
Args:
--port=7472
--log-level=info
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 25 Aug 2025 11:52:44 +0300
Finished: Mon, 25 Aug 2025 11:53:14 +0300
Ready: False
Restart Count: 29
Liveness: http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
METALLB_ML_SECRET_NAME: memberlist
METALLB_DEPLOYMENT: controller
Mounts:
/tmp/k8s-webhook-server/serving-certs from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dxbg7 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
--
Host Ports: 7472/TCP, 7946/TCP, 7946/UDP
Args:
--port=7472
--log-level=info
State: Waiting
Reason: CreateContainerConfigError
Ready: False
Restart Count: 0
Liveness: http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:monitoring/metrics delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
METALLB_NODE_NAME: (v1:spec.nodeName)
METALLB_HOST: (v1:status.hostIP)
METALLB_ML_BIND_ADDR: (v1:status.podIP)
METALLB_ML_LABELS: app=metallb,component=speaker
METALLB_ML_SECRET_KEY: <set to the key 'secretkey' in secret 'memberlist'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8mc8z (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
--
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 111s (x231 over 51m) kubelet Container image "quay.io/metallb/speaker:v0.13.3" already present on machine
Warning Failed 111s (x231 over 51m) kubelet Error: secret "memberlist" not found
Busybox check
microk8s` kubectl run -it --rm busybox --image=busybox:1.36 --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ #
/ #
/ # wget -qO- https://10.152.183.1:443 --no-check-certificate
wget: can't connect to remote host (10.152.183.1): Connection timed out
Logs calico controllers
microk8s kubectl logs calico-kube-controllers-5947598c79-f99vb -n kube-system
2025-08-25 09:18:13.176 [INFO][1] main.go 99: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
2025-08-25 09:18:13.178 [WARNING][1] winutils.go 150: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2025-08-25 09:18:13.179 [INFO][1] main.go 123: Ensuring Calico datastore is initialized
2025-08-25 09:18:43.182 [ERROR][1] client.go 287: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:18:43.182 [INFO][1] main.go 130: Failed to initialize datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [ERROR][1] client.go 287: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [INFO][1] main.go 130: Failed to initialize datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.152.183.1:443: i/o timeout
2025-08-25 09:19:13.191 [FATAL][1] main.go 143: Failed to initialize Calico datastore
Logs metallb controller
microk8s kubectl logs controller-7ffc454778-65llg -n metallb-system
{"branch":"dev","caller":"main.go:141","commit":"dev","goversion":"gc / go1.18.3 / amd64","level":"info","msg":"MetalLB controller starting version 0.13.3 (commit dev, branch dev)","ts":"2025-08-25T09:26:44Z","version":"0.13.3"}
I opened a issiue; https://github.com/canonical/microk8s/issues/5197