oci-cloud-controller-manager icon indicating copy to clipboard operation
oci-cloud-controller-manager copied to clipboard

1.27 csi-oci-controller pod CrashLoopBackOff

Open shihchang opened this issue 1 year ago • 4 comments

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

CCM Version: 1.27

Environment:

  • Kubernetes version (use kubectl version): Client Version: v1.24.0 Kustomize Version: v4.5.4 Server Version: v1.28.3+1.el8
  • OS (e.g. from /etc/os-release): Oracle Linux Server 8.8
  • Kernel (e.g. uname -a): uname -a Linux k8sol8-api-server-001 5.15.0-105.125.6.2.1.el8uek.x86_64 #2 SMP Thu Sep 14 21:58:59 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

csi-oci-controller pod CrashLoopBackOff, found two issues:

  • image-pull-secret:
  Warning  FailedToRetrieveImagePullSecret  4m20s (x543 over 114m)  kubelet  Unable to retrieve some image pull secrets (image-pull-secret); attempting to pull the image may not succeed.
  • snapshot-controller:
  snapshot-controller:
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
    Ready:          False
    Restart Count:  30
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nt4cl (ro)
      /var/run/shared-tmpfs from shared-tmpfs (rw)
 
Warning  BackOff  13s (x515 over 115m)  kubelet  Back-off restarting failed container snapshot-controller in pod csi-oci-controller-...

What you expected to happen?

csi-oci-controller status should be running

How to reproduce it (as minimally and precisely as possible)?

kubectl apply with https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml

Anything else we need to know?

shihchang avatar Dec 05 '23 20:12 shihchang

Can you please try to apply the manifest without the following lines in it https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml#L121-L122

AkarshES avatar Dec 07 '23 06:12 AkarshES

Can you please try to apply the manifest without the following lines in it https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml#L121-L122

Yes, removing the image-pull-secret does cleanup that FailedToRetrieveImagePullSecret warning. And after removing the failed snapshot-controller container, the csi-oci-controller pod did come up.

If the snapshot-controller came from the following, is it duplicate of the following (csi-snapshotter)? https://kubernetes-csi.github.io/docs/snapshot-controller.html https://github.com/kubernetes-csi/external-snapshotter In the csi-oci-controller pod while the snapshot-controller container failed, the csi-snapshotter container stauts is ready. Is it OK to remove the snapshot-controller ?

shihchang avatar Dec 07 '23 14:12 shihchang

@YashasG98 PTAL

mrunalpagnis avatar Mar 28 '24 13:03 mrunalpagnis

@shihchang the snapshot-controller and external-snapshotter help take backups of volumes and restore from the backups when needed

https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV.htm#contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV-PV_From_Snapshot_CSI

They can be removed if you do not intend to use the snapshot/restore ability.

Could you please share the output for kubectl logs for the failing container to help understand why it is crashing

YashasG98 avatar Apr 02 '24 04:04 YashasG98