topolvm
topolvm copied to clipboard
enforce compatibility check when restoring snapshot and creating clone
What
TopoLVM assumes that the restored snapshot PV and cloned PV have the same storage class as the original PV. However, this assumption is not verified on PV creation. So, if a pod consumes a restored/cloned PV having a different storage class from the original PV, this pod is stuck when their creation. It's better to check the compatibility of storage classes on creating restored/cloned PV.
The storage class of the original volume:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
...
name: topolvm-provisioner
resourceVersion: "824"
uid: 1f7b786d-b734-431c-80c7-01690d1a30e5
parameters:
csi.storage.k8s.io/fstype: xfs
topolvm.cybozu.com/device-class: ssd
provisioner: topolvm.cybozu.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
The storage class of the restored volume:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
...
name: topolvm-provisioner-ext4
resourceVersion: "228201"
uid: 15773da3-2c54-468d-baff-2e61cdee0602
parameters:
csi.storage.k8s.io/fstype: ext4
topolvm.cybozu.com/device-class: ssd
provisioner: topolvm.cybozu.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
A pod consuming the restored snapshot PV and/or cloned PV gets stuck like this.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 17m (x47 over 7h41m) kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[kube-api-access-znr7r my-volume]: timed out waiting for the condition
Warning FailedMount 8m8s (x153 over 7h39m) kubelet Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[my-volume kube-api-access-znr7r]: timed out waiting for the condition
Warning FailedMount 2m25s (x234 over 7h43m) kubelet MountVolume.SetUp failed for volume "pvc-690ad48b-bf34-4338-831f-c24fa6b8d32e" : rpc error: code = Internal desc = target device is already formatted with different filesystem: volume=5ee1e263-1654-4381-aa98-fff61052881a, current=xfs, new:ext4
NOTE:
external-provisioner
< v3.2 disallow the different storage classes of the original PV and the cloned PV. However, external-provisioner
>= v3.2 removes this limitation.
How
Checklist
- [ ] Finish implementation of the issue
- [ ] Test all functions
- [ ] Have enough logs to trace activities
- [ ] Notify developers of necessary actions
@leelavg @Yuggupta27 @nbalacha Could you work on this task after updating the e2e test and design documents?
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Im assuming this is still wanted. @satoru-takeuchi I can take over here.
@jakobmoellerdev Yes, please. Thank you!
For this issue I would ideally like to block out the creation of LVs alltogether based on a mismatch of the filesystem.
I believe that we do not have direct access to the storageClass in CreateVolume in the controller. Are there any ideas already that were thought of on how to restrict the creation of volumes with a source that has a mismatch in its filesystem compared to the requested capability?
I discussed with my colleagues and we found it's a bit difficult task. To tell the truth, we didn't consider the following fact that you pointed out.
I believe that we do not have direct access to the storageClass in CreateVolume in the controller.
IMO, just adding a description of this issue in limitations.md is OK for now.
I was thinking about this again, and I am wondering if we cannot solve this via a validatingwebhookconfiguration or mutatingwebhookconfiguration.
We could add a validatingwebhook for pods that could introspect PVCs with sources stemming from PVCs coming from TopoLVM and then reject the volumes if they do not have the same storageClass. Is this desired or should I go ahead with documenting a limitation?
Is this desired or should I go ahead with documenting a limitation?
Since this problem is not critical, adding a limitation is enough for now. Thank you for your suggestion.