calico icon indicating copy to clipboard operation
calico copied to clipboard

On Rocky 8.6 calico controller keeps restarting although same behaviour is not seen on Rocky 8.5

Open sridhar2288 opened this issue 3 years ago • 1 comments

pardon if this is not the correct place for reporting this issue , Please move or adjust the issue with some guidance

On Rocky 8.6 calico controller keeps restarting although same behaviour is not seen on Rocky 8.5

Expected Behavior

calico kube controller to not restart since it does not have access to write the logs for liveness and readiness probes.

Possible Solution

Solution 1 : If I disable liveness and readiness probes then the restart are not seen permission issues for status.json (#1) · Issues · Iron Bank Containers / Opensource / calico / kube-controllers · GitLab (dso.mil) Solution 2: If I add securityContext to calico kube controller , however dont want to have root access securityContext: runAsGroup: 0 runAsUser: 0

Your Environment

[user@master1 tmp]$ kgp | grep calico
kube-system    calico-kube-controllers-78d6f96c7b-6g6qc    0/1     CrashLoopBackOff   587        35h   10.244.82.192    master1.novalocal   <none>           <none>
kube-system    calico-node-dshfk                           1/1     Running            0          35h   11.127.144.123   worker3.novalocal   <none>           <none>
kube-system    calico-node-mjkzw                           1/1     Running            0          35h   11.127.144.121   worker1.novalocal   <none>           <none>
kube-system    calico-node-qfsc7                           1/1     Running            0          35h   11.127.144.122   worker2.novalocal   <none>           <none>
kube-system    calico-node-zmqx7                           1/1     Running            0          35h   11.127.144.111   master1.novalocal   <none>           <none>
kube-system    calico-node-zv9ql                           1/1     Running            1          35h   11.127.144.124   worker4.novalocal   <none>           <none>
[user@master1 tmp]$ 
[user@master1 tmp]$ kubectl describe pod calico-kube-controllers-78d6f96c7b-6g6qc -n kube-system
Name:                 calico-kube-controllers-78d6f96c7b-6g6qc
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master1.novalocal/11.127.144.111
Start Time:           Tue, 04 Oct 2022 03:31:37 +0000
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=78d6f96c7b
Annotations:          cni.projectcalico.org/podIP: 10.244.82.192/32
                      cni.projectcalico.org/podIPs: 10.244.82.192/32
Status:               Running
IP:                   10.244.82.192
IPs:
  IP:           10.244.82.192
Controlled By:  ReplicaSet/calico-kube-controllers-78d6f96c7b
Containers:
  calico-kube-controllers:
    Container ID:   docker://9b50cc90befaaef6faec68eea224d51591e93ee1b005c5e8d3d7252cecde69d5
    Image:          docker.io/calico/kube-controllers:v3.19.1
    Image ID:       docker://sha256:5d3d5ddc8605ded8f69d76ee488072c7d02c32a8e4e8b34640a884c6eb939c0a
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Wed, 05 Oct 2022 14:48:57 +0000
      Finished:     Wed, 05 Oct 2022 14:49:57 +0000
    Ready:          False
    Restart Count:  591
    Liveness:       exec [/usr/bin/check-status -l] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4wjq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-j4wjq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  46m (x3419 over 35h)  kubelet  Liveness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
  Warning  Unhealthy  11m (x4518 over 35h)  kubelet  Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
  Warning  BackOff    83s (x7191 over 35h)  kubelet  Back-off restarting failed container
[user@master1 tmp]$ 

sridhar2288 avatar Oct 05 '22 15:10 sridhar2288

v3.19.1 is quite old - I'd recommend updating to a newer version of Calico that's still in support.

In the latest versions, Calico explicitly sets permission on those files in the Dockerfile: https://github.com/projectcalico/calico/blob/master/kube-controllers/Dockerfile.amd64#L26

caseydavenport avatar Oct 10 '22 21:10 caseydavenport

Closing this issue since there hasn't been any movement on it for a while. Please feel free to shout out and reopen the issue if this is still occurring.

mgleung avatar Nov 29 '22 17:11 mgleung