vsphere-csi-driver
vsphere-csi-driver copied to clipboard
Relocation of volumes between datastores
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
Is there a way to migrate CSI vSphere volumes between datastores? We are not using vSan or Tanzu.
One idea would be to use CSI cloning.
Any idea? To help. Is their way to do it manual?
You can use the standard disk vmotion procedure (Migrate->Change Storage only and enable Configure per disk) as you would normally do with a mounted disk of a Virtual Machine. The tricky part is to identify which is the actual disk so you can then migrate ...
You use a kubectl plugin named vtopology which will map pv name to disk id or you can go the hard way and use powercli for quering vsphere ...
The output of vtopology is like ` === Storage Policy (SPBM) information for PV pvc-e74383ca-fad3-479d-b91e-b283d9e872a0 ===
Kubernetes VM/Node : k8s-plus-wrk05-bt.lab.up
Hard Disk Name : Hard disk 30
Policy Name : silver
Policy Compliance : compliant
`
So you can then vmotion the particular disk (e.g: Hard disk 30) to another Datastore.
Hope this helps and also for some official Documentation on actions like these.
I would like to have this feature if this correctly what i want. We have two storageclasses with datastore on two different nfs stores.
My plan is migrate between both of storageclasses.
@farodin91 I'm trying to understand why you want to relocate the volumes from one datastore to another. Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?
Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?
I want to decommission datastores.
Is it because you want to decommission the datastore, or is it because you want to balance the capacity between the two datastores?
I want to decommission datastores.
Did my solution worked out for you?
@achontzo We tried out vmotion and it worked with anartifact on the datastore. Before the fcd was in Directory which was called fcd, now it's in a folder which is called like the originating VM. On the k8s side, we have to manual patch the storageclass.
We started to tried out the CnsRelocate command, but we got a wired error that relocate is disabled.
We have heard the following storage vMotion requirements for CNS volumes:
- Capacity load balancing between storage (could be mixed datastore types like VMFS, NFS, vSAN, vVol).
- Datastore maintenance mode support so that all the CNS volumes can be storage vMotioned out of the datastore which will be decommissioned or prepare for firmware upgrade, etc
- Storage vMotion volumes from one datastore to another that could be in a different datacenter.
@farodin91 Could you validate if this captures your requirements?
@SandeepPissay For our case mainly, 2 and 3 would best match for our requirements.
@SandeepPissay For our case mainly, 2 and 3 would best match for our requirements.
@farodin91 regarding requirement (3), do you have separate vCenters managing the datacenters or a single vCenter? I'm wondering if we are looking at cross vCenter vMotion.
We have just a single vCenter in this case.
@achontzo We tried out vmotion and it worked with anartifact on the datastore. Before the fcd was in Directory which was called fcd, now it's in a folder which is called like the originating VM. On the k8s side, we have to manual patch the storageclass.
We have actually the same issue which bit us pretty hard...
We had Storage DRS vMotion FCDs on our datastore cluster, the VMDKs afterwards being directly in a VM's folder instead of the fcd folder.
We are also using Cluster API and whenever the VM where the VMDK was attached to gets killed (because of an upgrade for example, which provisions new VMs and kills the old ones) the affected PVs are broken and cannot be used as CNS disks anymore.
There needs to be some warning sign somewhere Don't use Storage vMotion/DRS with CNS volumes, or they will break
For what it’s worth , I am curious @marratj what you mean by the PVs getting broken? The PVs are FCDs under the covers and vcenter maintains the link to the VMDK even after it is moved by sVM.
We had tested sVM with TKGI and CSI back in 2019 and had no issues moving PVs across nodes during upgrades. we found that the old PV VMDKs would be in the old VM folder even after that VM was deleted. Perhaps CAPV is doing something odd on VM delete with its attached volumes? the BOSH CPI just does a mass detach of all volumes (BOSH or foreign like a K8s PV) prior to VM deletion.
@SandeepPissay speaking from what I’ve seen in the past, other situations with our BOSH experience with TAS and TKGI
- different datastores on different compute clusters, different datacenters (ideally some day with different vcenters)
- “shared nothing” cases ie. VSAN / nutanix, where the compute clusters can’t see each other’s Datastores and thus the sVM data transfer happens over the network rather than shared storage
- need to handle VM deletion after sVM - ie. move the sVM’d VMDK back to a predictable folder on the datastore (eg. “fcd”) so it can be in a known location rather than stale VM folders
@svrc "broken" means that the CSI driver cannot mount the volume anymore.
(*types.LocalizedMethodFault)(0xc0009abba0)({\n DynamicData: (types.DynamicData) {
\n },\n Fault: (*types.NotFound)(0xc0009abbc0)({\n VimFault: (types.VimFault) {\n MethodFault: (types.MethodFault) {\n FaultCause: (*types.LocalizedMethodFault)(nil),\n FaultMessage: ([]types.LocalizableMessage) nil\n }\n
}\n }),\n LocalizedMessage: (string) (len=50) \"The object or item referred to could not be found.\"\n})\n". opId: "72b115b
The thing is that SDRS moves the VMDK files out of its original fcd folder where it was created into the VM-specific folder on the new datastore it's being migrated to; new datastore, new folder, new VMDK name even (e.g. it gets renamed from fcd/839395e8712e46f285d309818e0eb22f.vmdk to vmname/vmname_2.vmdk during the migration to the new datastore).
We already were in contact with VMware support about this and they confirmed that Storage DRS breaks the CNS/FCD relationship due to this in a way that the CSI driver cannot find the volume anymore and the only way for now is to keep SDRS disabled.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@marratj did your recover the disk that were moved by DRS ? how?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I've recently been dealing with this as well, and was able to get around it, after some troubleshooting. This may not solve others' issues but I wanted to share. In my case, I was getting errors when vCenter tried to detach volumes, and it was due to there being a snapshot of the backing vm that was associated with the mount. As soon as I deleted the snapshot, all my errors went away, the mount detached/attached as intended, and kubernetes was happy again.
This feature missing renders any realistic vSphere CSI use-case broken. You can't even migrate data to a new datastore when it get's decomissioned. On virtualized infrastructure thats daily operations. @svrc's question is very valid: it's unclear why the CSI isn't able to find migrated FCDs after they have been moved. Wasn't the hole point of FCDs to make VMDKs identifyable by moref/moid/uuid just like any other ManagedObject in the vSphere API? Why are display name (!) paths used to identify the relevant objects for the CSI (node VMs, FCDs)? Would be really interested in the design decision behind that.
Is https://github.com/vmware-samples/cloud-native-storage-self-service-manager fixing this problem ? I have the feeling that's the case
Yes, we have CNS Self Service Manager available to help relocate volume from one datastore to another datastore. Refer to
- https://github.com/vmware-samples/cloud-native-storage-self-service-manager/releases/tag/v0.1.0
- https://github.com/vmware-samples/cloud-native-storage-self-service-manager/blob/main/docs/book/features/storage_vmotion.md
Actually, this tool leads to the exact same issue with FCDs landing in the wrong folder on the new Datastore https://github.com/vmware-samples/cloud-native-storage-self-service-manager/issues/19
FYI at least in vSphere 8.x (and maybe in some 7.x U patch too) you can perform a FCD migration right from the UI in the CNS volumes view of the vSphere Cluster. The mentioned cns-self-service tool is not worth your time...