oci-cloud-controller-manager
oci-cloud-controller-manager copied to clipboard
1.27 csi-oci-controller pod CrashLoopBackOff
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
CCM Version: 1.27
Environment:
-
Kubernetes version (use
kubectl version
): Client Version: v1.24.0 Kustomize Version: v4.5.4 Server Version: v1.28.3+1.el8 - OS (e.g. from /etc/os-release): Oracle Linux Server 8.8
-
Kernel (e.g.
uname -a
): uname -a Linux k8sol8-api-server-001 5.15.0-105.125.6.2.1.el8uek.x86_64 #2 SMP Thu Sep 14 21:58:59 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux - Others:
What happened?
csi-oci-controller pod CrashLoopBackOff, found two issues:
- image-pull-secret:
Warning FailedToRetrieveImagePullSecret 4m20s (x543 over 114m) kubelet Unable to retrieve some image pull secrets (image-pull-secret); attempting to pull the image may not succeed.
- snapshot-controller:
snapshot-controller:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Ready: False
Restart Count: 30
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nt4cl (ro)
/var/run/shared-tmpfs from shared-tmpfs (rw)
Warning BackOff 13s (x515 over 115m) kubelet Back-off restarting failed container snapshot-controller in pod csi-oci-controller-...
What you expected to happen?
csi-oci-controller status should be running
How to reproduce it (as minimally and precisely as possible)?
kubectl apply with https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml
Anything else we need to know?
Can you please try to apply the manifest without the following lines in it https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml#L121-L122
Can you please try to apply the manifest without the following lines in it https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml#L121-L122
Yes, removing the image-pull-secret does cleanup that FailedToRetrieveImagePullSecret warning. And after removing the failed snapshot-controller container, the csi-oci-controller pod did come up.
If the snapshot-controller came from the following, is it duplicate of the following (csi-snapshotter)? https://kubernetes-csi.github.io/docs/snapshot-controller.html https://github.com/kubernetes-csi/external-snapshotter In the csi-oci-controller pod while the snapshot-controller container failed, the csi-snapshotter container stauts is ready. Is it OK to remove the snapshot-controller ?
@YashasG98 PTAL
@shihchang the snapshot-controller and external-snapshotter help take backups of volumes and restore from the backups when needed
https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV.htm#contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV-PV_From_Snapshot_CSI
They can be removed if you do not intend to use the snapshot/restore ability.
Could you please share the output for kubectl logs for the failing container to help understand why it is crashing