rbd: CephCSI cannot determine correct clone depth in certain case
Describe the bug
If parent PVC of snapshot/ restore/clone pvc is deleted, then https://github.com/ceph/ceph-csi/blob/35eb347eaccfc8875b1c4e2073f557c008f5e656/internal/rbd/rbd_util.go#L700 will not work as intended. Therefore, the clone depth returned is not correct since this function requires all parent images in chain to be present in cluster(deleted parent images will be trash).
One side affect is the issue described here https://github.com/rook/rook/issues/12312. Creating a chain of clone/snapshot+restore and deleting parent snapshot and PVC renders the final child PVC to be unmountable.
Environment details
- Image/version of Ceph CSI driver : All supported cephcsi versions.
- Helm chart version : -
- Kernel version : -
- Mounter used for mounting PVC (for cephFS its
fuseorkernel. for rbd itskrbdorrbd-nbd) : krbd - Kubernetes cluster version : all
- Ceph cluster version : all
Steps to reproduce
Steps to reproduce the behavior:
- Create a chain of clones/ snapshot+restores
- Delete parent snapshot/clone immediately after child pvc/snapshot creation
- Try to mount child PVC to a pod.
Actual results
rbd map fails with the below error:
E0531 18:09:09.891563 15892 utils.go:210] ID: 198 Req-ID: 0001-0009-rook-ceph-0000000000000001-8b978541-8495-4c20-bcab-0a42fa927b5a GRPC error: rpc error: code = Internal desc = rbd: map failed with error an error (exit status 22) occurred while running rbd args: [--id csi-rbd-node -m 10.110.0.127:6789,10.109.223.145:6789,10.108.43.136:6789 --keyfile=***stripped*** map replicapool/csi-vol-8b978541-8495-4c20-bcab-0a42fa927b5a --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed rbd: map failed: (22) Invalid argument
Expected behavior
No error.
Workaround
Flatten the child PVC image manually.
Possible Solution
From CephCSI point of view, we have no other to determine clone depth. We need a API change possibly in rbd image info providing us with clone depth from ceph.
cc @ceph/ceph-csi-contributors
Steps to reproduce:
- Create [Restore]PVC
- Create Snapshot
- Delete parent PVC and restore PVC
- Delete snapshot
- Repeat step 2 (untill there is 17 images in trash
rbd trash ls <pool_name>| wc -l) - Mount child PVC to a pod
Nodeplugin logs:
I0724 12:32:42.749654 7461 utils.go:195] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 GRPC call: /csi.v1.Node/NodeStageVolume
I0724 12:32:42.749810 7461 utils.go:206] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/86f184afd78d2e10464567a1eaa6d77fd2b52867f9c4f76350333f39fc3dc557/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":7}},"volume_context":{"clusterID":"rook-ceph","imageFeatures":"layering","imageFormat":"2","imageName":"csi-vol-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6","journalPool":"replicapool","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1690201077472-8271-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6"}
I0724 12:32:42.750671 7461 omap.go:88] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 got omap values: (pool="replicapool", namespace="", name="csi.volume.c36ef26f-d656-4f3b-9b12-d54b9d9c98a6"): map[csi.imageid:12f02815b4b2 csi.imagename:csi-vol-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 csi.volname:pvc-475deef4-7f81-4750-b75f-e1e1b07a538f csi.volume.owner:rook-ceph]
I0724 12:32:43.170381 7461 rbd_util.go:352] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 checking for ImageFeatures: [layering operations]
I0724 12:32:43.198124 7461 cephcmds.go:105] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 command succeeded: rbd [device list --format=json --device-type krbd]
I0724 12:32:43.353638 7461 rbd_attach.go:419] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 rbd: map mon 192.168.39.232:6789
I0724 12:32:43.669135 7461 cephcmds.go:98] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 an error (exit status 22) occurred while running rbd args: [--id csi-rbd-node -m 192.168.39.232:6789 --keyfile=***stripped*** map replicapool/csi-vol-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 --device-type krbd --options noudev]
W0724 12:32:43.669173 7461 rbd_attach.go:468] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 rbd: map error an error (exit status 22) occurred while running rbd args: [--id csi-rbd-node -m 192.168.39.232:6789 --keyfile=***stripped*** map replicapool/csi-vol-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 --device-type krbd --options noudev], rbd output: rbd: sysfs write failed
rbd: map failed: (22) Invalid argument
E0724 12:32:43.669309 7461 utils.go:210] ID: 90 Req-ID: 0001-0009-rook-ceph-0000000000000001-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 GRPC error: rpc error: code = Internal desc = rbd: map failed with error an error (exit status 22) occurred while running rbd args: [--id csi-rbd-node -m 192.168.39.232:6789 --keyfile=***stripped*** map replicapool/csi-vol-c36ef26f-d656-4f3b-9b12-d54b9d9c98a6 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (22) Invalid argument
Dmesg logs : dmesg.txt
images in trash:
bash-4.4$ rbd trash ls replicapool | wc -l
18
cc @ceph/ceph-csi-contributors
@idryomov Can you please provide your inputs on this issue?
@idryomov Can you please provide your inputs on this issue?
The kernel client is failing to map an image because it has more than 16 images in the parent chain, most or all of which are in the trash:
[ 3327.965645] rbd: id 12f02b4046bf: unable to get image name
[ 3327.966827] rbd: id 12f041960b82: unable to get image name
[ 3327.968113] rbd: id 12f05858ad42: unable to get image name
[ 3327.969280] rbd: id 12f0c68079be: unable to get image name
[ 3327.970612] rbd: id 12f0cbf47a6: unable to get image name
[ 3327.972276] rbd: id 12f0193d54ed: unable to get image name
[ 3327.973473] rbd: id 115e71c81440: unable to get image name
[ 3327.974756] rbd: id 115e5c165e0c: unable to get image name
[ 3327.976137] rbd: id 115ef49affa9: unable to get image name
[ 3327.977415] rbd: id 115efd7ba12b: unable to get image name
[ 3327.978439] rbd: id 115e63c98b1a: unable to get image name
[ 3327.979750] rbd: id 115e1f145b8: unable to get image name
[ 3327.981078] rbd: id 115e919877e0: unable to get image name
[ 3327.982162] rbd: id 115e2403b185: unable to get image name
[ 3327.983619] rbd: id 115e5a5abd49: unable to get image name
[ 3327.984825] rbd: id 115e3d321ae6: unable to get image name
[ 3327.985458] rbd: parent chain is too long (17)
For further inputs, please translate the steps to reproduce to rbd commands. While doing that, you would likely see how/where the parent images build-up occurs.
@idryomov Can you please provide your inputs on this issue?
The kernel client is failing to map an image because it has more than 16 images in the parent chain, most or all of which are in the trash:
Yup, the images being in trash is expected
For further inputs, please translate the steps to reproduce to
rbdcommands. While doing that, you would likely see how/where the parent images build-up occurs.
Equivalent rbd commands can be found here https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/rbd-snap-clone.md. The build is happening because CephCSI cannot determine the clone chain depth appropriately and therefore is not [adding task to/] flattening the image.
In short,
- CephCSI moves deleted images to trash and adds a task to remove it.
- CephCSI determines clone chain depth by traversing the image chain
https://github.com/ceph/ceph-csi/blob/35eb347eaccfc8875b1c4e2073f557c008f5e656/internal/rbd/rbd_util.go#L700
- This method fails if a parent PVC/Snapshot is deleted and image is in trash
- Each kubernetes PVC and snapshot are meant to be independent of parent or child pvc/snapshots. Therefore, we cannot impose any restriction here.
- We need a way to determine the chain depth in another way, maybe a field in
rbd image info?
I see. The way getCloneDepth() just returns current depth when it encounters an empty image name looks wrong, as is bailing on ErrImageNotFound error.
When getting parent details via rbd_get_parent() API, the returned rbd_linked_image_spec_t struct has trash and image_id fields. If trash true, it means the parent image is in the trash. In that case, the image ID from image_id can be used to open the parent image with rbd_open_by_id() API, thus moving on to the next iteration.
I see. The way
getCloneDepth()just returns currentdepthwhen it encounters an empty image name looks wrong, as is bailing onErrImageNotFounderror.When getting parent details via
rbd_get_parent()API, the returnedrbd_linked_image_spec_tstruct hastrashandimage_idfields. Iftrashtrue, it means the parent image is in the trash. In that case, the image ID fromimage_idcan be used to open the parent image withrbd_open_by_id()API, thus moving on to the next iteration.
Thanks !,
Currently go-ceph is not reading and exporting those details, https://github.com/ceph/go-ceph/blob/ce4031e218edce2afdc79a714b7123a74b1e3c78/rbd/snapshot_nautilus.go#L106-L109
We'll need to add it there before using it at cephcsi
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Hey, we are currently being affected by this issue. Is there any progress towards solving it?
Or, are there any workarounds for the problem?
Hi @Champ-Goblem , I just tried to reproduce it today, so that I can continue with #4029 which was an early approach to fix this. However, with recent Ceph-CSI, I am not able to hit any issue like this anymore. (It may have been fixes as a side effect of an other change.)
What version of Ceph-CSI do you have, and can you explain the steps that you did to hit the problem?
@nixpanic I got the same issue (rbd: parent chain is too long (17)) , and my version of Ceph-CSI is v3.8.0