Optimize RBD mirrors to be space efficient
Describe the feature you'd like to have
RBD mirrors are currently problematic in 2 instances:
- The VolumeReplication is enabled on a volume A, whose parent B is in the trash
- The VolumeReplication is enabled on a volume A, whose parent B is not mirrored (no VolumeReplication on it)
- Usually, in that case, the parent is a VolumeSnapshot, which is (unless I'm wrong), not referenceable in the VolumeReplication CRD
To fix those issues, one can set the flattenMode to "force", which will flatten the RBD image before trying the mirroring process. This will fix any issue, at the cost of extreme space ineffiency.
This is especially true if mirroring all PVCs from a cluster to another. The space efficiency brough by snapshots will be cancelled as every PVC needs to be flattened before being mirrored.
I propose the following:
- If the image is in the trash, flatten it before trying to mirror (or get it out of the trash?)
- If the parent is not mirrored, mirror the parent and then mirror the child (gotta be careful introducing that dependency, as it also requires cleaning up the parent on the destination cluster, and that parent may be referenced by multiple children)
Another possibility for the second proposition is to add a feature to the Ceph-CSI to mirror snapshots.
For example, the specs of a VolumeReplication could be:
spec:
dataSource:
apiGroup: ""
kind: VolumeSnapshot
name: abcd
instead of the standard kind: PersistentVolumeClaim
What is the value to the end user? (why is it a priority?)
More space efficiency for users that want to mirror their clusters
How will we know we have a good solution? (acceptance criteria)
- Trashed parents are non-blocking
- Missing mirroring on the parents are non-blocking
- Flattening is done only when it can't be done otherwise
Additional context
Add any other context or screenshots about the feature request here.
Possibly linked: https://github.com/ceph/ceph-csi/issues/2427 https://github.com/ceph/ceph-csi/issues/2426
@SkalaNetworks There's a data corruption bug in ceph rbd mirroring of parent-child image relationship https://tracker.ceph.com/issues/61891
@Rakshith-R from the bug report I understand it doesn't affect flattened images? That's a pretty bad bug that would completely annihilate the chances of implementing what I proposed...
@Rakshith-R from the bug report I understand it doesn't affect flattened images?
Yes, mirroring a image with no parents will work absolutely fine. That's why we chose to flatten the image itself before mirroring.
That's a pretty bad bug that would completely annihilate the chances of implementing what I proposed...
:/ Mirroring volumesnapshots was one of the my first solutions too to avoid flattening and maintain space efficiency followed by having cephcsi restore parent images in trash (+ restore rbd snap(not supported currently) on it, rbd snap connecting parent and child is needed for mirroring to work) + enable mirror on parent before the child. This of course needs a kind of cleanup agent running on both cluster (pretty complex).
Similar to what you've proposed in the pr description.
If the image is in the trash, flatten it before trying to mirror (or get it out of the trash?) If the parent is not mirrored, mirror the parent and then mirror the child (gotta be careful introducing that dependency, as it also requires cleaning up the parent on the destination cluster, and that parent may be referenced by multiple children)