spec
spec copied to clipboard
Problems mounting XFS volume clones / restored snapshots
XFS does not allow to mount two volumes that have the same UUID on the same machine. The second mount fails with:
[44557.612032] XFS (vde): Filesystem has duplicate UUID fadf19ab-bbcc-4f40-8d4f-44550e822db1 - can't mount
This is problematic when using a cloned volume or a restored snapshot - the original volume and the new volume cannot be mounted on the same compute node.
On what level should be the issue solved?
-
CSI spec could mention that it's CSI plugin problem to make sure cloned / restored volumes are usable on the same node as the original vol (e.g. by using
-o nouuid
mount opt. for xfs volumes or usingxfs_admin -U genereate
to re-generate UUID on the first mount after volume restore / clone). -
CSI spec could mention that it's CO problem to pass e.g.
-o nouuid
mount option to all XFS NodeStage/NodePublish calls.
In both cases, someone must check that XFS is used and know that it needs a special handling.
This sounds like a plug-in issue, not a CO issue.
On Wed, May 26, 2021, 6:24 AM Jan Šafránek @.***> wrote:
XFS does not allow to mount two volumes that have the same UUID on the same machine. The second mount fails with:
[44557.612032] XFS (vde): Filesystem has duplicate UUID fadf19ab-bbcc-4f40-8d4f-44550e822db1 - can't mount
This is problematic when using a cloned volume or a restored snapshot - the original volume and the new volume cannot be mounted on the same compute node.
On what level should be the issue solved?
CSI spec could mention that it's CSI plugin problem to make sure cloned / restored volumes are usable on the same node as the original vol (e.g. by using -o nouuid mount opt. for xfs volumes or using xfs_admin -U genereate to re-generate UUID on the first mount after volume restore / clone).
CSI spec could mention that it's CO problem to pass e.g. -o nouuid mount option to all XFS NodeStage/NodePublish calls.
In both cases, someone must check that XFS is used and know that it needs a special handling.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLGBUMHJBN4LGK4USBTTPTD4FANCNFSM45RWFRPQ .
Agreed with @jdef. We discussed this at the community meeting today, see Notes
Conclusion:
- For this bug:
- Pick a random CSI driver to fix this issue in the driver
- Write an E2E test to catch this issues
- Make the E2E test opt-in and send messages to CSI announce to encourage other CSI drivers to fix this issue.
- At some point in the future make the test required.
- For this class of bug:
- Must do the same as above for each similar bug we discover.
Ceph CSI was hit on this issue at clone time , https://github.com/ceph/ceph-csi/issues/966#issuecomment-618846057 , nouuid
or regenerating UUID
(- depends on xfsprogs version..etc) could be the possible fixes.
i. Pick a random CSI driver to fix this issue in the driver ii. Write an E2E test to catch this issues
I'm experimenting with a Kubernetes e2e test and AWS EBS CSI driver fix in https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/913
Turned into real e2e test in Kubernetes: https://github.com/kubernetes/kubernetes/pull/102538