cloudstack
cloudstack copied to clipboard
Fix potential leaking of volume maps
ISSUE TYPE
- Enhancement Request
COMPONENT NAME
Storage
CLOUDSTACK VERSION
Any
SUMMARY
Unmapping volumes from a hypervisor host upon VM stop or migration is done on a best-effort basis. The VM is already stopped, or already migrated, we try to unmap, but if something goes wrong there is really no recourse or retry and a warning is logged. This leaves a potential of leaking maps to hosts over time.
In code review I've also found edge cases where a VM is moved to "Stopped" state without necessarily cleaning up network or volume resources, these can also lead to leaked maps over time. Examples are force removing a hypervisor host with running VMs on it, and possibly any other code that just calls vm.setState(State.Stopped)
.
My request is that we be more thorough during VM start in ensuring that our target host and only our target host has access to the volume. Or at least call the storage plugin involved to let it decide how to do this. It should be as simple as calling the storage service to "revoke all" just before we grant access, or allowing for an exclusive grant in the storage API.
For example, with the PowerFlex/ScaleIO storage client there is an unmapVolumeFromAllSdcs
that could be called just prior to granting access to volumes during VM start.
We may need to add a revokeAllAccess()
method to the PrimaryDataStoreDriver
, or add a flag to the existing revokeAccess
to indicate that the storage driver should revoke all.
Or alternatively (I think I like this better), the grantAccess()
call might gain a flag boolean exclusive
so the storage driver can be instructed to ensure that only one mapping exists - the one requested. This would be cleaner.
Crucially - we need to avoid exclusive access during the live migration workflows. It seems safe to ensure exclusive access during VM start, however.