On Rocky 8.6 calico controller keeps restarting although same behaviour is not seen on Rocky 8.5
pardon if this is not the correct place for reporting this issue , Please move or adjust the issue with some guidance
On Rocky 8.6 calico controller keeps restarting although same behaviour is not seen on Rocky 8.5
Expected Behavior
calico kube controller to not restart since it does not have access to write the logs for liveness and readiness probes.
Possible Solution
Solution 1 : If I disable liveness and readiness probes then the restart are not seen permission issues for status.json (#1) · Issues · Iron Bank Containers / Opensource / calico / kube-controllers · GitLab (dso.mil) Solution 2: If I add securityContext to calico kube controller , however dont want to have root access securityContext: runAsGroup: 0 runAsUser: 0
Your Environment
[user@master1 tmp]$ kgp | grep calico
kube-system calico-kube-controllers-78d6f96c7b-6g6qc 0/1 CrashLoopBackOff 587 35h 10.244.82.192 master1.novalocal <none> <none>
kube-system calico-node-dshfk 1/1 Running 0 35h 11.127.144.123 worker3.novalocal <none> <none>
kube-system calico-node-mjkzw 1/1 Running 0 35h 11.127.144.121 worker1.novalocal <none> <none>
kube-system calico-node-qfsc7 1/1 Running 0 35h 11.127.144.122 worker2.novalocal <none> <none>
kube-system calico-node-zmqx7 1/1 Running 0 35h 11.127.144.111 master1.novalocal <none> <none>
kube-system calico-node-zv9ql 1/1 Running 1 35h 11.127.144.124 worker4.novalocal <none> <none>
[user@master1 tmp]$
[user@master1 tmp]$ kubectl describe pod calico-kube-controllers-78d6f96c7b-6g6qc -n kube-system
Name: calico-kube-controllers-78d6f96c7b-6g6qc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master1.novalocal/11.127.144.111
Start Time: Tue, 04 Oct 2022 03:31:37 +0000
Labels: k8s-app=calico-kube-controllers
pod-template-hash=78d6f96c7b
Annotations: cni.projectcalico.org/podIP: 10.244.82.192/32
cni.projectcalico.org/podIPs: 10.244.82.192/32
Status: Running
IP: 10.244.82.192
IPs:
IP: 10.244.82.192
Controlled By: ReplicaSet/calico-kube-controllers-78d6f96c7b
Containers:
calico-kube-controllers:
Container ID: docker://9b50cc90befaaef6faec68eea224d51591e93ee1b005c5e8d3d7252cecde69d5
Image: docker.io/calico/kube-controllers:v3.19.1
Image ID: docker://sha256:5d3d5ddc8605ded8f69d76ee488072c7d02c32a8e4e8b34640a884c6eb939c0a
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 05 Oct 2022 14:48:57 +0000
Finished: Wed, 05 Oct 2022 14:49:57 +0000
Ready: False
Restart Count: 591
Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4wjq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-j4wjq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 46m (x3419 over 35h) kubelet Liveness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
Warning Unhealthy 11m (x4518 over 35h) kubelet Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
Warning BackOff 83s (x7191 over 35h) kubelet Back-off restarting failed container
[user@master1 tmp]$
v3.19.1 is quite old - I'd recommend updating to a newer version of Calico that's still in support.
In the latest versions, Calico explicitly sets permission on those files in the Dockerfile: https://github.com/projectcalico/calico/blob/master/kube-controllers/Dockerfile.amd64#L26
Closing this issue since there hasn't been any movement on it for a while. Please feel free to shout out and reopen the issue if this is still occurring.