Restrict the migration of volumes attached to VMs in Starting state
Description
This PR restricts users from migrating volumes attached to VMs that are in starting state. Migrating volumes while their VMs are starting shouldn't be allowed because sometimes the VM could end up not actually starting, and if ACS starts migrating its volume, this could lead to other issues.
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
- [ ] test (unit or integration test code)
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [x] Minor
- [ ] Trivial
Screenshots (if appropriate):
Image
How Has This Been Tested?
I tested this PR by deploying an instance and then trying to migrate its volume while the instance was starting. Before the changes, the volume migration process would start. After the migration, an error is thrown, as expected.
Codecov Report
Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.
Project coverage is 15.85%. Comparing base (
9df783c) to head (7903aa5). Report is 81 commits behind head on 4.19.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...n/java/com/cloud/storage/VolumeApiServiceImpl.java | 0.00% | 9 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## 4.19 #9725 +/- ##
============================================
+ Coverage 15.11% 15.85% +0.73%
- Complexity 11184 11269 +85
============================================
Files 5402 5042 -360
Lines 473120 444385 -28735
Branches 58327 52630 -5697
============================================
- Hits 71507 70441 -1066
+ Misses 393812 366072 -27740
- Partials 7801 7872 +71
| Flag | Coverage Δ | |
|---|---|---|
| uitests | ? |
|
| unittests | 15.85% <0.00%> (+0.04%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Is there any other state that we should be checking besides starting? migrating, for example?
Is there any other state that we should be checking besides starting? migrating, for example?
Good point, though this is already an improvement;
Starting(true, "VM is being started. At this state, you should find host id filled which means it's being started on that host."),
Running(false, "VM is running. host id has the host that it is running on."),
Stopping(true, "VM is being stopped. host id has the host that it is being stopped on."),
Stopped(false, "VM is stopped. host id should be null."),
Destroyed(false, "VM is marked for destroy."),
Expunging(true, "VM is being expunged."),
Migrating(true, "VM is being migrated. host id holds to from host"),
Error(false, "VM is in error"),
Unknown(false, "VM state is unknown."),
Shutdown(false, "VM state is shutdown from inside"),
Restoring(true, "VM is being restored from backup");
I think (at least) Stopping, Destroyed, Expunging, Restoring are good candidate VirtualMachine.States to include. Or in reverse, only Running, Stopped, Shutdown and maybe Error or Unknown are good vm states to migrate volumes for.
thanks @bernardodemarco
@JoaoJandre , do you agree that this PR is good to go, even when more improvements are possible?
@DaanHoogland considering it's an easy change, I don't see why introduce this and change it later. Since @FelipeM525 is no longer working on this PR: @bernardodemarco could you take care of it?
@DaanHoogland considering it's an easy change, I don't see why introduce this and change it later. Since @FelipeM525 is no longer working on this PR: @bernardodemarco could you take care of it?
Yes, no problem. I'll update the verification to only allow the migration to be performed when the VM is Running, Stopped or Shutdown.
Yes, no problem. I'll update the verification to only allow the migration to be performed when the VM is Running, Stopped or Shutdown.
@DaanHoogland, @rohityadavcloud, @JoaoJandre I've just applied the changes.
root@cloudstack:~# grep -a 'logid:bf08211b' /var/log/cloudstack/management/management-server.log
2025-01-06 12:13:48,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Executing AsyncJobVO: {id:141, userId: 2, accountId: 2, instanceType: Volume, instanceId: 21, cmd: org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin, cmdInfo: {"response":"json","ctxUserId":"2","sessionkey":"07_Zt7hC_aogIMgiM2EDW7Hs0xk","volumeid":"8dfa4a5e-2e0f-4c57-ad1e-2a19b6a3c6b9","httpmethod":"GET","ctxStartEventId":"273","ctxDetails":"{\"interface com.cloud.storage.Volume\":\"8dfa4a5e-2e0f-4c57-ad1e-2a19b6a3c6b9\",\"interface com.cloud.storage.StoragePool\":\"a0566c34-a49d-31bc-865f-3b57806131d6\"}","ctxAccountId":"2","livemigrate":"false","cmdEventType":"VOLUME.MIGRATE","storageid":"a0566c34-a49d-31bc-865f-3b57806131d6"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 227341783673585, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2025-01-06 12:13:48,488 DEBUG [c.c.s.VolumeApiServiceImpl] (API-Job-Executor-7:ctx-8253b178 job-141 ctx-fe58e892) (logid:bf08211b) Unable to migrate volume: [ROOT-19] Id: [21] because the VM: [i-2-19-VM] Id: [9228084c-7e2f-4e08-b40d-99a96c2ec23a] is in state [Stopping], which is not supported for migration.
2025-01-06 12:13:48,493 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Unexpected exception while executing org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin
2025-01-06 12:13:48,496 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Complete async job-141, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Volume migration is not allowed when the VM is in the Stopping state. Supported states are: [Stopped, Running, Shutdown]."}
@blueorangutan package
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12002
@blueorangutan test
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests
[SF] Trillian test result (tid-12056) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 45617 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9725-t12056-kvm-ol8.zip Smoke tests completed. 132 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|---|---|---|
| test_01_secure_vm_migration | Error |
134.42 | test_vm_life_cycle.py |
| test_01_secure_vm_migration | Error |
134.43 | test_vm_life_cycle.py |