cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Restrict the migration of volumes attached to VMs in Starting state

Open FelipeM525 opened this issue 1 year ago • 2 comments

Description

This PR restricts users from migrating volumes attached to VMs that are in starting state. Migrating volumes while their VMs are starting shouldn't be allowed because sometimes the VM could end up not actually starting, and if ACS starts migrating its volume, this could lead to other issues.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI
  • [ ] test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [x] Minor
  • [ ] Trivial

Screenshots (if appropriate):

Image

migrate-volume

How Has This Been Tested?

I tested this PR by deploying an instance and then trying to migrate its volume while the instance was starting. Before the changes, the volume migration process would start. After the migration, an error is thrown, as expected.

FelipeM525 avatar Sep 24 '24 17:09 FelipeM525

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Project coverage is 15.85%. Comparing base (9df783c) to head (7903aa5). Report is 81 commits behind head on 4.19.

Files with missing lines Patch % Lines
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 0.00% 9 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #9725      +/-   ##
============================================
+ Coverage     15.11%   15.85%   +0.73%     
- Complexity    11184    11269      +85     
============================================
  Files          5402     5042     -360     
  Lines        473120   444385   -28735     
  Branches      58327    52630    -5697     
============================================
- Hits          71507    70441    -1066     
+ Misses       393812   366072   -27740     
- Partials       7801     7872      +71     
Flag Coverage Δ
uitests ?
unittests 15.85% <0.00%> (+0.04%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 24 '24 17:09 codecov[bot]

Is there any other state that we should be checking besides starting? migrating, for example?

JoaoJandre avatar Oct 02 '24 18:10 JoaoJandre

Is there any other state that we should be checking besides starting? migrating, for example?

Good point, though this is already an improvement;

        Starting(true, "VM is being started.  At this state, you should find host id filled which means it's being started on that host."),
        Running(false, "VM is running.  host id has the host that it is running on."),
        Stopping(true, "VM is being stopped.  host id has the host that it is being stopped on."),
        Stopped(false, "VM is stopped.  host id should be null."),
        Destroyed(false, "VM is marked for destroy."),
        Expunging(true, "VM is being   expunged."),
        Migrating(true, "VM is being migrated.  host id holds to from host"),
        Error(false, "VM is in error"),
        Unknown(false, "VM state is unknown."),
        Shutdown(false, "VM state is shutdown from inside"),
        Restoring(true, "VM is being restored from backup");

I think (at least) Stopping, Destroyed, Expunging, Restoring are good candidate VirtualMachine.States to include. Or in reverse, only Running, Stopped, Shutdown and maybe Error or Unknown are good vm states to migrate volumes for.

DaanHoogland avatar Nov 07 '24 14:11 DaanHoogland

thanks @bernardodemarco

@JoaoJandre , do you agree that this PR is good to go, even when more improvements are possible?

DaanHoogland avatar Jan 03 '25 13:01 DaanHoogland

@DaanHoogland considering it's an easy change, I don't see why introduce this and change it later. Since @FelipeM525 is no longer working on this PR: @bernardodemarco could you take care of it?

JoaoJandre avatar Jan 03 '25 14:01 JoaoJandre

@DaanHoogland considering it's an easy change, I don't see why introduce this and change it later. Since @FelipeM525 is no longer working on this PR: @bernardodemarco could you take care of it?

Yes, no problem. I'll update the verification to only allow the migration to be performed when the VM is Running, Stopped or Shutdown.

bernardodemarco avatar Jan 03 '25 17:01 bernardodemarco

Yes, no problem. I'll update the verification to only allow the migration to be performed when the VM is Running, Stopped or Shutdown.

@DaanHoogland, @rohityadavcloud, @JoaoJandre I've just applied the changes.

root@cloudstack:~# grep -a 'logid:bf08211b' /var/log/cloudstack/management/management-server.log 
2025-01-06 12:13:48,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Executing AsyncJobVO: {id:141, userId: 2, accountId: 2, instanceType: Volume, instanceId: 21, cmd: org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin, cmdInfo: {"response":"json","ctxUserId":"2","sessionkey":"07_Zt7hC_aogIMgiM2EDW7Hs0xk","volumeid":"8dfa4a5e-2e0f-4c57-ad1e-2a19b6a3c6b9","httpmethod":"GET","ctxStartEventId":"273","ctxDetails":"{\"interface com.cloud.storage.Volume\":\"8dfa4a5e-2e0f-4c57-ad1e-2a19b6a3c6b9\",\"interface com.cloud.storage.StoragePool\":\"a0566c34-a49d-31bc-865f-3b57806131d6\"}","ctxAccountId":"2","livemigrate":"false","cmdEventType":"VOLUME.MIGRATE","storageid":"a0566c34-a49d-31bc-865f-3b57806131d6"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 227341783673585, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2025-01-06 12:13:48,488 DEBUG [c.c.s.VolumeApiServiceImpl] (API-Job-Executor-7:ctx-8253b178 job-141 ctx-fe58e892) (logid:bf08211b) Unable to migrate volume: [ROOT-19] Id: [21] because the VM: [i-2-19-VM] Id: [9228084c-7e2f-4e08-b40d-99a96c2ec23a] is in state [Stopping], which is not supported for migration.
2025-01-06 12:13:48,493 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Unexpected exception while executing org.apache.cloudstack.api.command.admin.volume.MigrateVolumeCmdByAdmin
2025-01-06 12:13:48,496 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-7:ctx-8253b178 job-141) (logid:bf08211b) Complete async job-141, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Volume migration is not allowed when the VM is in the Stopping state. Supported states are: [Stopped, Running, Shutdown]."}

Screenshot from 2025-01-06 09-14-02

bernardodemarco avatar Jan 06 '25 12:01 bernardodemarco

@blueorangutan package

DaanHoogland avatar Jan 07 '25 09:01 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jan 07 '25 09:01 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12002

blueorangutan avatar Jan 07 '25 10:01 blueorangutan

@blueorangutan test

DaanHoogland avatar Jan 07 '25 10:01 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Jan 07 '25 10:01 blueorangutan

[SF] Trillian test result (tid-12056) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 45617 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9725-t12056-kvm-ol8.zip Smoke tests completed. 132 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 134.42 test_vm_life_cycle.py
test_01_secure_vm_migration Error 134.43 test_vm_life_cycle.py

blueorangutan avatar Jan 08 '25 00:01 blueorangutan