cloudstack After cloudstack canceled volume live migration, VMWare still keeps on migrating this volume => Cloustack keeps old/wrong location of volume.

After cloudstack canceled volume live migration, VMWare still keeps on migrating this volume => Cloustack keeps old/wrong location of volume.

Open fabeulus opened this issue 2 years ago • 2 comments

ISSUE TYPE

Bug Report

COMPONENT NAME

API, Volume

CLOUDSTACK VERSION

4.17.1.0

CONFIGURATION

VMWare vSphere

OS / ENVIRONMENT

SUMMARY

I tried to live-migrate a 4TB-volume via following command in cloudmonkey. But after the given threshold of 120 minutes ("global settings" -> "job.cancel.threshold.minutes") this jobs has been canceled by cloudstack:

"jobresult": { "errorcode": 530, "errortext": "Unable to serialize: Job is cancelled as it has been blocking others for too long"

VMWare had new/correct location of volume, but cloustack keeps old/wrong location of volume.

STEPS TO REPRODUCE

* For simmulating, set "global setting" "job.cancel.threshold.minutes" timeperode to a very small value, or adjust volume-size
* Migrate a volume to another storage, what should be canceled during specified timeperode (Also via GUI possible):
              CMK> migrate volume livemigrate=true storageid=xxxxg volumeid=xxxxxx

EXPECTED RESULTS

This migration in cloudstack has been "canceled" correctly, but migration in VMWare should have also been aborted.

ACTUAL RESULTS

VMWare finished the volume livemigration, but cloudstack does not know this new location.
If that corresponding virutal machine is shut down and startet again. this vm will be "orchestrated" with the wrong volume-informations and vm will not boot any longer.

Cloudstack-Error:

Unable to orchestrate start VM instance {id: "xxxx", name: "i-xxx-xxxxx-VM", uuid: "xxxx", type="User"} due to [Unable to start instance 'xxxxx' (xxxx), see management server log for details].

Feb 03 '23 11:02 fabeulus

Thanks for opening your first issue here! Be sure to follow the issue template!

Feb 03 '23 11:02 boring-cyborg[bot]

It looks like the job is cancelled in cloudstack, but no in vcenter, which causes inconsistent information.

It might be good to have a background thread to scan the storage pools (cloudstack might already have)

Feb 11 '23 09:02 weizhouapache

VMware client in CS supports cancel the migration volume, any other task if it's in cancelable state. It seems the jobs in the hypervisor are not cancelled, when the parent job is cancelled due to job cancel threshold time. Either the related hypervisor jobs have to cancelled if possible or the resources (VM, Volume) have to be sync-ed with their latest state sometime/delay after job is cancelled using background thread. Maybe, this needs proper function definition (requires detailed investigation) - what hypervisors to support, what jobs - cancellable or not, which resources to sync, cleanup required or not, any other actions to be taken, etc.

https://github.com/apache/cloudstack/blob/1383625c93e300c6b8d62b52ddfd090d3291fc74/vmware-base/src/main/java/com/cloud/hypervisor/vmware/util/VmwareClient.java#L785

https://github.com/apache/cloudstack/blob/1383625c93e300c6b8d62b52ddfd090d3291fc74/vmware-base/src/main/java/com/cloud/hypervisor/vmware/util/VmwareClient.java#L807-L814

Jun 11 '24 08:06 sureshanaparti

cloudstack cloudstack copied to clipboard

After cloudstack canceled volume live migration, VMWare still keeps on migrating this volume => Cloustack keeps old/wrong location of volume.

ISSUE TYPE

COMPONENT NAME

CLOUDSTACK VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

cloudstack
cloudstack copied to clipboard