ceph-csi
ceph-csi copied to clipboard
Race condition when mounting RBD static PVs with the same image name in different pools
Describe the bug
When two RBD Images with the same name in different pools are attempted to be mounted as staticVolume following this procedure, only the one attempted to be mounted first will succeed.
The later one succeeds in creating the PV/PVC and associating the pod with the PVC, but the pod remains in the Pending state with the following error displayed in the events in the pod:
Multi-Attach error for volume "foo-pv" Volume is already used by 1 pod(s) in different namespaces
Environment details
- Image/version of Ceph CSI driver :
v3.10.2
- Kernel version :
6.5.0-26-generic
- Mounter used for mounting PVC :
rbd
- Kubernetes cluster version :
v1.29.3
- Ceph cluster version :
18.2.1 reef (stable)
Steps to reproduce
Steps to reproduce the behavior:
- Setup details:
- Setup Rook-ceph Cluster
- Create pools named
foo
andbar
, enabled applicationrbd
- Create RBD Image
test
under each 2 pools
- Deploy below 2 PVs
apiVersion: v1 kind: PersistentVolume metadata: name: foo-pv spec: accessModes: - ReadWriteOnce capacity: storage: 1Gi csi: driver: rook-ceph.rbd.csi.ceph.com fsType: ext4 nodeStageSecretRef: name: rook-csi-rbd-node namespace: rook-ceph volumeAttributes: clusterID: "rook-ceph" pool: "foo" staticVolume: "true" imageFeatures: "layering,fast-diff,object-map,deep-flatten,exclusive-lock" volumeHandle: test persistentVolumeReclaimPolicy: Retain volumeMode: Filesystem --- apiVersion: v1 kind: PersistentVolume metadata: name: bar-pv spec: accessModes: - ReadWriteOnce capacity: storage: 1Gi csi: driver: rook-ceph.rbd.csi.ceph.com fsType: ext4 nodeStageSecretRef: name: rook-csi-rbd-node namespace: rook-ceph volumeAttributes: clusterID: "rook-ceph" pool: "bar" staticVolume: "true" imageFeatures: "layering,fast-diff,object-map,deep-flatten,exclusive-lock" volumeHandle: test persistentVolumeReclaimPolicy: Retain volumeMode: Filesystem
- Claim above 2 PVs from some pods placed in different namespaces.
- See error
orMulti-Attach error for volume "foo-pv" Volume is already used by 1 pod(s) in different namespaces
Multi-Attach error for volume "bar-pv" Volume is already used by 1 pod(s) in different namespaces
Actual results
Later mounted pods will stack at Pending due to an error.
Expected behavior
Both pods successfully mount the RBD.
Logs
If the issue is in PVC mounting please attach complete logs of below containers.
- csi-rbdplugin No logs are apeeared
- driver-registrar
I0331 07:32:36.733019 899351 main.go:135] Version: v2.10.0 I0331 07:32:36.733206 899351 main.go:136] Running node-driver-registrar in mode= I0331 07:32:43.746178 899351 node_register.go:55] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock I0331 07:32:43.746370 899351 node_register.go:64] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock I0331 07:32:43.747289 899351 node_register.go:88] Skipping HTTP server because endpoint is set to: "" I0331 07:32:44.681946 899351 main.go:90] Received GetInfo call: &InfoRequest{} I0331 07:32:48.988262 899351 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
Additional context
I used Rook ceph, but I don't think the problem is related to Rook.
@AsPulse i believe the error is coming from the kubernetes please check kubelet logs. is it possible to use different names in volumeHandle in the PV?