ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

PVC always be pending (failed) after startup few days.

Open jpsn123 opened this issue 2 years ago • 3 comments

Describe the bug

PVC always be pending (failed) after startup few days. if i restart the csi-rbd-provisioner pod, the blocked PVC will be readly, and every thing will be ok, but just several days later, PVC will be pending again and need to restart csi-rbd-provisioner pod manually.

Environment details

  • Image/version of Ceph CSI driver : 3.6.2
  • Helm chart version : 3.7.0
  • Kernel version : 5.4.x
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : kernel
  • Kubernetes cluster version : 1.23.5
  • Ceph cluster version : 16.2.7

Logs

log from rbdplugin

I0805 05:23:45.466532       1 utils.go:191] ID: 11302 GRPC call: /csi.v1.Identity/Probe
I0805 05:23:45.467168       1 utils.go:195] ID: 11302 GRPC request: {}
I0805 05:23:45.467314       1 utils.go:202] ID: 11302 GRPC response: {}
I0805 05:24:45.435601       1 utils.go:191] ID: 11303 GRPC call: /csi.v1.Identity/Probe
I0805 05:24:45.435771       1 utils.go:195] ID: 11303 GRPC request: {}
I0805 05:24:45.435821       1 utils.go:202] ID: 11303 GRPC response: {}
I0805 05:25:45.454311       1 utils.go:191] ID: 11304 GRPC call: /csi.v1.Identity/Probe
I0805 05:25:45.454520       1 utils.go:195] ID: 11304 GRPC request: {}
I0805 05:25:45.454576       1 utils.go:202] ID: 11304 GRPC response: {}
I0805 05:26:45.409803       1 utils.go:191] ID: 11305 GRPC call: /csi.v1.Identity/Probe
I0805 05:26:45.410217       1 utils.go:195] ID: 11305 GRPC request: {}
I0805 05:26:45.410385       1 utils.go:202] ID: 11305 GRPC response: {}
I0805 05:27:45.406491       1 utils.go:191] ID: 11306 GRPC call: /csi.v1.Identity/Probe
I0805 05:27:45.406875       1 utils.go:195] ID: 11306 GRPC request: {}
I0805 05:27:45.407157       1 utils.go:202] ID: 11306 GRPC response: {}
I0805 05:28:10.839023       1 utils.go:191] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:28:10.840290       1 utils.go:195] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:28:10.843389       1 rbd_util.go:1262] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d setting disableInUseChecks: false image features: [exclusive-lock object-map deep-flatten layering fast-diff] mounter: rbd
I0805 05:28:10.852223       1 omap.go:87] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d got omap values: (pool="jutze.base.rbd", namespace="pub2", name="csi.volumes.default"): map[]
I0805 05:28:10.859958       1 omap.go:155] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d set omap keys (pool="jutze.base.rbd", namespace="pub2", name="csi.volumes.default"): map[csi.volume.pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d:6522a278-147f-11ed-b10e-165f5a49a7d5])
I0805 05:28:10.862018       1 omap.go:155] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d set omap keys (pool="jutze.base.rbd", namespace="pub2", name="csi.volume.6522a278-147f-11ed-b10e-165f5a49a7d5"): map[csi.imagename:csi-vol-ssd-rep-base-pub2-6522a278-147f-11ed-b10e-165f5a49a7d5 csi.volname:pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d csi.volume.owner:jmp])
I0805 05:28:10.862079       1 rbd_journal.go:485] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d generated Volume ID (0001-0009-ceph-pub2-0000000000000002-6522a278-147f-11ed-b10e-165f5a49a7d5) and image name (csi-vol-ssd-rep-base-pub2-6522a278-147f-11ed-b10e-165f5a49a7d5) for request name (pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d)
I0805 05:28:10.862237       1 rbd_util.go:415] ID: 11307 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d rbd: create jutze.base.rbd/pub2/csi-vol-ssd-rep-base-pub2-6522a278-147f-11ed-b10e-165f5a49a7d5 size 8192M (features: [exclusive-lock object-map deep-flatten layering fast-diff]) using mon 10.16.3.13:6789,10.16.3.12:6789,10.16.3.11:6789
I0805 05:28:45.421159       1 utils.go:191] ID: 11308 GRPC call: /csi.v1.Identity/Probe
I0805 05:28:45.421271       1 utils.go:195] ID: 11308 GRPC request: {}
I0805 05:28:45.421300       1 utils.go:202] ID: 11308 GRPC response: {}
I0805 05:29:45.406658       1 utils.go:191] ID: 11309 GRPC call: /csi.v1.Identity/Probe
I0805 05:29:45.406762       1 utils.go:195] ID: 11309 GRPC request: {}
I0805 05:29:45.406795       1 utils.go:202] ID: 11309 GRPC response: {}
I0805 05:30:45.433367       1 utils.go:191] ID: 11310 GRPC call: /csi.v1.Identity/Probe
I0805 05:30:45.433483       1 utils.go:195] ID: 11310 GRPC request: {}
I0805 05:30:45.433519       1 utils.go:202] ID: 11310 GRPC response: {}
I0805 05:31:45.406741       1 utils.go:191] ID: 11311 GRPC call: /csi.v1.Identity/Probe
I0805 05:31:45.407110       1 utils.go:195] ID: 11311 GRPC request: {}
I0805 05:31:45.407171       1 utils.go:202] ID: 11311 GRPC response: {}
I0805 05:32:45.441324       1 utils.go:191] ID: 11312 GRPC call: /csi.v1.Identity/Probe
I0805 05:32:45.441455       1 utils.go:195] ID: 11312 GRPC request: {}
I0805 05:32:45.441508       1 utils.go:202] ID: 11312 GRPC response: {}
I0805 05:33:11.327743       1 utils.go:191] ID: 11313 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:33:11.328113       1 utils.go:195] ID: 11313 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:33:11.328430       1 rbd_util.go:1262] ID: 11313 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d setting disableInUseChecks: false image features: [layering fast-diff exclusive-lock object-map deep-flatten] mounter: rbd
E0805 05:33:11.328482       1 controllerserver.go:276] ID: 11313 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
E0805 05:33:11.328531       1 utils.go:200] ID: 11313 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
I0805 05:33:12.342211       1 utils.go:191] ID: 11314 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:33:12.342960       1 utils.go:195] ID: 11314 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:33:12.343226       1 rbd_util.go:1262] ID: 11314 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d setting disableInUseChecks: false image features: [layering fast-diff exclusive-lock object-map deep-flatten] mounter: rbd
E0805 05:33:12.343262       1 controllerserver.go:276] ID: 11314 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
E0805 05:33:12.343311       1 utils.go:200] ID: 11314 Req-ID: pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists

log from csi-provisoner

I0805 05:27:49.263038       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:27:54.277638       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:27:59.294075       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:04.315262       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:09.334686       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:10.792347       1 controller.go:1337] provision "jmp/minio" class "jutze-block-ssd-rep-base-pub2": started
I0805 05:28:10.795361       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jmp", Name:"minio", UID:"1af8d583-628b-4367-a2a6-fd8fe0f0f89d", APIVersion:"v1", ResourceVersion:"202885700", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "jmp/minio"
I0805 05:28:10.796182       1 controller.go:528] skip translation of storage class for plugin: rbd.csi.ceph.com
I0805 05:28:10.817050       1 connection.go:183] GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:28:10.817588       1 connection.go:184] GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:28:14.085972       1 reflector.go:536] sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845: Watch close - *v1.PersistentVolume total 10 items received
I0805 05:28:14.349289       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:19.364607       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:24.380493       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:28:29.397610       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
.
.
.
I0805 05:32:50.292706       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:32:55.306197       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:32:58.920109       1 reflector.go:536] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolumeClaim total 9 items received
I0805 05:33:00.323129       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:33:05.340001       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:33:10.358546       1 leaderelection.go:278] successfully renewed lease ceph/rbd-csi-ceph-com
I0805 05:33:10.817725       1 connection.go:186] GRPC response: {}
I0805 05:33:10.817893       1 connection.go:187] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0805 05:33:10.817997       1 controller.go:764] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Background: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0805 05:33:10.818214       1 controller.go:1082] Temporary error received, adding PVC 1af8d583-628b-4367-a2a6-fd8fe0f0f89d to claims in progress
W0805 05:33:10.818318       1 controller.go:934] Retrying syncing claim "1af8d583-628b-4367-a2a6-fd8fe0f0f89d", failure 0
E0805 05:33:10.818448       1 controller.go:957] error syncing claim "1af8d583-628b-4367-a2a6-fd8fe0f0f89d": failed to provision volume with StorageClass "jutze-block-ssd-rep-base-pub2": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0805 05:33:10.818566       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jmp", Name:"minio", UID:"1af8d583-628b-4367-a2a6-fd8fe0f0f89d", APIVersion:"v1", ResourceVersion:"202885700", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "jutze-block-ssd-rep-base-pub2": rpc error: code = DeadlineExceeded desc = context deadline exceeded
I0805 05:33:11.318627       1 controller.go:1337] provision "jmp/minio" class "jutze-block-ssd-rep-base-pub2": started
I0805 05:33:11.319088       1 controller.go:528] skip translation of storage class for plugin: rbd.csi.ceph.com
I0805 05:33:11.319191       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jmp", Name:"minio", UID:"1af8d583-628b-4367-a2a6-fd8fe0f0f89d", APIVersion:"v1", ResourceVersion:"202885700", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "jmp/minio"
I0805 05:33:11.326700       1 connection.go:183] GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:33:11.326731       1 connection.go:184] GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:33:11.328916       1 connection.go:186] GRPC response: {}
I0805 05:33:11.329104       1 connection.go:187] GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
I0805 05:33:11.330289       1 controller.go:764] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Background: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
I0805 05:33:11.330479       1 controller.go:1082] Temporary error received, adding PVC 1af8d583-628b-4367-a2a6-fd8fe0f0f89d to claims in progress
W0805 05:33:11.330572       1 controller.go:934] Retrying syncing claim "1af8d583-628b-4367-a2a6-fd8fe0f0f89d", failure 1
E0805 05:33:11.330672       1 controller.go:957] error syncing claim "1af8d583-628b-4367-a2a6-fd8fe0f0f89d": failed to provision volume with StorageClass "jutze-block-ssd-rep-base-pub2": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
I0805 05:33:11.330748       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jmp", Name:"minio", UID:"1af8d583-628b-4367-a2a6-fd8fe0f0f89d", APIVersion:"v1", ResourceVersion:"202885700", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "jutze-block-ssd-rep-base-pub2": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists
I0805 05:33:12.331767       1 controller.go:1337] provision "jmp/minio" class "jutze-block-ssd-rep-base-pub2": started
I0805 05:33:12.331896       1 controller.go:528] skip translation of storage class for plugin: rbd.csi.ceph.com
I0805 05:33:12.332872       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"jmp", Name:"minio", UID:"1af8d583-628b-4367-a2a6-fd8fe0f0f89d", APIVersion:"v1", ResourceVersion:"202885700", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "jmp/minio"
I0805 05:33:12.340327       1 connection.go:183] GRPC call: /csi.v1.Controller/CreateVolume
I0805 05:33:12.340521       1 connection.go:184] GRPC request: {"capacity_range":{"required_bytes":8589934592},"name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","parameters":{"clusterID":"ceph-pub2","csi.storage.k8s.io/pv/name":"pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d","csi.storage.k8s.io/pvc/name":"minio","csi.storage.k8s.io/pvc/namespace":"jmp","imageFeatures":"layering,exclusive-lock,object-map,fast-diff,deep-flatten","pool":"jutze.base.rbd","snapshotNamePrefix":"csi-snap-ssd-rep-base-pub2-","volumeNamePrefix":"csi-vol-ssd-rep-base-pub2-"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0805 05:33:12.343730       1 connection.go:186] GRPC response: {}
I0805 05:33:12.344138       1 connection.go:187] GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d already exists

jpsn123 avatar Aug 05 '22 06:08 jpsn123

the CreateVolume request for pvc-1af8d583-628b-4367-a2a6-fd8fe0f0f89d never finished. The RBD-image should have been in the creation process. This can get stuck when the Ceph pool that should hold the image is not healthy. Please check the status of the Ceph cluster and the availability of the OSDs.

nixpanic avatar Aug 05 '22 08:08 nixpanic

@nixpanic ceph cluster is always health, i sure that. it seem rbdplugin lost connect to ceph cluster and can't reconnect to it, just need a restart, all will ok.

jpsn123 avatar Aug 05 '22 09:08 jpsn123

in addition, if success create pvc, csi-rbdplugin and csi-rbdplugin-contoller will both show some action logs, but if pvc create failed, only csi-rbdplugin some logs , csi-rbdplugin-contoller container show nothing except successfully renewed lease ceph/rbd.csi.ceph.com-ceph.

jpsn123 avatar Aug 05 '22 09:08 jpsn123

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 04 '22 21:09 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

github-actions[bot] avatar Sep 11 '22 21:09 github-actions[bot]