spec icon indicating copy to clipboard operation
spec copied to clipboard

Problems mounting XFS volume clones / restored snapshots

Open jsafrane opened this issue 3 years ago • 5 comments

XFS does not allow to mount two volumes that have the same UUID on the same machine. The second mount fails with:

[44557.612032] XFS (vde): Filesystem has duplicate UUID fadf19ab-bbcc-4f40-8d4f-44550e822db1 - can't mount

This is problematic when using a cloned volume or a restored snapshot - the original volume and the new volume cannot be mounted on the same compute node.

On what level should be the issue solved?

  • CSI spec could mention that it's CSI plugin problem to make sure cloned / restored volumes are usable on the same node as the original vol (e.g. by using -o nouuid mount opt. for xfs volumes or using xfs_admin -U genereate to re-generate UUID on the first mount after volume restore / clone).

  • CSI spec could mention that it's CO problem to pass e.g. -o nouuid mount option to all XFS NodeStage/NodePublish calls.

In both cases, someone must check that XFS is used and know that it needs a special handling.

jsafrane avatar May 26 '21 10:05 jsafrane

This sounds like a plug-in issue, not a CO issue.

On Wed, May 26, 2021, 6:24 AM Jan Šafránek @.***> wrote:

XFS does not allow to mount two volumes that have the same UUID on the same machine. The second mount fails with:

[44557.612032] XFS (vde): Filesystem has duplicate UUID fadf19ab-bbcc-4f40-8d4f-44550e822db1 - can't mount

This is problematic when using a cloned volume or a restored snapshot - the original volume and the new volume cannot be mounted on the same compute node.

On what level should be the issue solved?

CSI spec could mention that it's CSI plugin problem to make sure cloned / restored volumes are usable on the same node as the original vol (e.g. by using -o nouuid mount opt. for xfs volumes or using xfs_admin -U genereate to re-generate UUID on the first mount after volume restore / clone).

CSI spec could mention that it's CO problem to pass e.g. -o nouuid mount option to all XFS NodeStage/NodePublish calls.

In both cases, someone must check that XFS is used and know that it needs a special handling.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLGBUMHJBN4LGK4USBTTPTD4FANCNFSM45RWFRPQ .

jdef avatar May 26 '21 10:05 jdef

Agreed with @jdef. We discussed this at the community meeting today, see Notes

Conclusion:

  • For this bug:
    1. Pick a random CSI driver to fix this issue in the driver
    2. Write an E2E test to catch this issues
    3. Make the E2E test opt-in and send messages to CSI announce to encourage other CSI drivers to fix this issue.
    4. At some point in the future make the test required.
  • For this class of bug:
    • Must do the same as above for each similar bug we discover.

saad-ali avatar May 26 '21 16:05 saad-ali

Ceph CSI was hit on this issue at clone time , https://github.com/ceph/ceph-csi/issues/966#issuecomment-618846057 , nouuid or regenerating UUID (- depends on xfsprogs version..etc) could be the possible fixes.

humblec avatar May 27 '21 04:05 humblec

i. Pick a random CSI driver to fix this issue in the driver ii. Write an E2E test to catch this issues

I'm experimenting with a Kubernetes e2e test and AWS EBS CSI driver fix in https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/913

jsafrane avatar Jun 01 '21 08:06 jsafrane

Turned into real e2e test in Kubernetes: https://github.com/kubernetes/kubernetes/pull/102538

jsafrane avatar Jun 03 '21 12:06 jsafrane