cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

[VMware] Disk controller mappings

Open winterhazel opened this issue 10 months ago • 19 comments

Description

This is a refactor of the disk controller related logic for VMware that also adds support for SATA and NVME controllers.

A detailed description of these changes is available at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Disk+Controller+Mappings.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [X] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [X] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI
  • [X] test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [X] Major
  • [ ] Minor

How Has This Been Tested?

The tests below were performed for VMs with the following rootDiskController and dataDiskController configurations:

  • osdefault/osdefault (converted to lsilogic/lsilogic)
  • ide/ide
  • pvscsi/pvscsi
  • sata/sata
  • nvme/nvme
  • sata/lsilogic
  • ide/osdefault
  • osdefault/ide
  1. VM deployment: I deployed one VM with each of the configurations. I verified in vCenter that they had the correct amount of disk controllers, and that each volume was associated to the expected controller. The sata/lsilogic VM was the only one that had a data disk; the others only had a root disk.

  2. VM start: I stopped the VMs deployed in (1) and started them again. I verified in vCenter that they had the correct amount of disk controllers, and that each volume was associated to the expected controller.

  3. Disk attachment: while the VMs were running, I tried to attach a data disk. All the data disks were attached successfully (expect for the VMs using IDE as the data disk controller, which does not allow hot plugging disks; for these, I attached the disks after stopping the VM). I verified that all the disks were using the expected controller. Then, I stopped and started the VM, and verified that they were still using the expected controllers. Finally, I stoped the VMs and detached the volumes. I verified that they were detached successfully.

  4. VM import: I unmanaged the VMs and imported them back. I verified that their settings were infered successfully according to the existing disk controllers. Then, I started the VMs, and verified that the controllers and the volumes were configured correctly.

The next tests were performed using the following imported VMs:

  • osdefault/osdefault
  • ide/ide
  • nvme/nvme
  • sata/lsilogic
  1. Volume migration: I migrated the volumes from NFS to local storage, and verified that the migration finished successfully. Then, I started the VMs and verified that both the controllers and the disks were configured correctly.

  2. Volume resize: I expanded all of the disks, and verified in vCenter that their size was changed. Then, I started the VMs and verified that both the controllers and the disks were configured correctly.

  3. VM snapshot: I took some VM snapshots, started the VMs and verified that everything was ok. I changed the configurations of the VM using osdefault/osdefault to sata/sata and started the VM to begin the reconfiguration process. I verified that the disk controllers in use were not removed, and that the disks were still associated with the previous controllers; however, the SATA controllers were also created. The VM was working as expected. Finally, I deleted the VM snapshots.

  4. Template creation from volume: I created templates from the root disks. Then, I deployed VMs from the templates. I verified that all the VMs had the same disk controllers as the original VM, and that the only existing disk was correctly associated with the configured root disk controller.

  5. Template creation from volume snapshot: I took snapshots from the root disks, and created templates from the snapshots. Then, I deployed VMs from the templates. I verified that all the VMs had the same disk controllers as the original VM, and that the only existing disk was correctly associated with the configured root disk controller.

  6. VM scale: with the VMs stopped, I scaled the VM from Small Instance to Medium Instance. I verified that the offering was changed. I started the VMs, and verified that the VMs were correctly reconfigured in vCenter.

Other tests:

  • System VM creation: after applying the patches, I recreated the SSVM and the CPVM. I verified that they were using a single LSI Logic controller. I also verified the controllers of a new VR and of an existing VR.

  • I attached 3 disks to the ide/ide controller. When trying to attach a 4th disk, I got an expected exception, as the IDE bus reached the maximum amount of devices (the 4th one was the CD/DVD drive).

  • I removed all the disks from the sata/lsilogic VM. I tried to attach the root disk again, and verified that it was attached successfully. I started the VM, and verified that it was configured correctly.

  • I attached 8 disks to the pvscsi/pvscsi VM, and verified that the 8th disk was successfully attached to device number 8 (device number 7 is reserved for the controller).

winterhazel avatar Feb 24 '25 17:02 winterhazel

@blueorangutan package

winterhazel avatar Feb 24 '25 17:02 winterhazel

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 24 '25 17:02 blueorangutan

Codecov Report

:x: Patch coverage is 62.92776% with 195 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 17.63%. Comparing base (a15fbd9) to head (188be60). :warning: Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...he/cloudstack/storage/DiskControllerMappingVO.java 31.86% 61 Missing and 1 partial :warning:
...oud/hypervisor/vmware/resource/VmwareResource.java 74.65% 31 Missing and 6 partials :warning:
...m/cloud/hypervisor/vmware/mo/VirtualMachineMO.java 78.46% 23 Missing and 5 partials :warning:
...tack/storage/dao/DiskControllerMappingDaoImpl.java 0.00% 20 Missing :warning:
...com/cloud/hypervisor/vmware/util/VmwareHelper.java 86.86% 13 Missing :warning:
.../com/cloud/agent/api/SecStorageVMSetupCommand.java 0.00% 6 Missing :warning:
...esource/VmwareSecondaryStorageResourceHandler.java 0.00% 6 Missing :warning:
...ain/java/com/cloud/api/query/QueryManagerImpl.java 0.00% 6 Missing :warning:
...cloud/storage/resource/VmwareStorageProcessor.java 0.00% 5 Missing :warning:
.../secondarystorage/SecondaryStorageManagerImpl.java 0.00% 4 Missing :warning:
... and 4 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10454      +/-   ##
============================================
+ Coverage     17.53%   17.63%   +0.09%     
- Complexity    15463    15590     +127     
============================================
  Files          5897     5900       +3     
  Lines        527397   527687     +290     
  Branches      64407    64382      -25     
============================================
+ Hits          92505    93060     +555     
+ Misses       424496   424191     -305     
- Partials      10396    10436      +40     
Flag Coverage Δ
uitests 3.59% <ø> (-0.01%) :arrow_down:
unittests 18.70% <62.92%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Feb 24 '25 17:02 codecov[bot]

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12549

blueorangutan avatar Feb 24 '25 17:02 blueorangutan

@DaanHoogland it seems there were some merge issues in main. org.apache.cloudstack.backup.VeeamBackupProvider is missing some methods and imports.

winterhazel avatar Feb 24 '25 18:02 winterhazel

@DaanHoogland it seems there were some merge issues in main. org.apache.cloudstack.backup.VeeamBackupProvider is missing some methods and imports.

I'll check and update

DaanHoogland avatar Feb 24 '25 19:02 DaanHoogland

@winterhazel , please see #10457 . I have had no time (or infra) to test yet.

DaanHoogland avatar Feb 24 '25 20:02 DaanHoogland

@blueorangutan package

winterhazel avatar Feb 26 '25 17:02 winterhazel

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 26 '25 18:02 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12586

blueorangutan avatar Feb 26 '25 19:02 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Apr 16 '25 16:04 github-actions[bot]

@winterhazel could you fix the conflicts?

JoaoJandre avatar Jun 02 '25 20:06 JoaoJandre

@blueorangutan package

winterhazel avatar Jun 04 '25 16:06 winterhazel

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 04 '25 16:06 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13603

blueorangutan avatar Jun 04 '25 17:06 blueorangutan

@blueorangutan test keepEnv

DaanHoogland avatar Jun 09 '25 08:06 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Jun 09 '25 08:06 blueorangutan

[SF] Trillian test result (tid-13478) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 67978 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10454-t13478-kvm-ol8.zip Smoke tests completed. 141 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Jun 10 '25 04:06 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Jun 10 '25 10:06 github-actions[bot]

@winterhazel could you fix the conflicts?

JoaoJandre avatar Jul 04 '25 17:07 JoaoJandre

@blueorangutan package

winterhazel avatar Jul 08 '25 16:07 winterhazel

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jul 08 '25 16:07 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14104

blueorangutan avatar Jul 08 '25 18:07 blueorangutan

looks like you packaged before pushing the last changes @winterhazel , so @blueorangutan package

DaanHoogland avatar Jul 09 '25 07:07 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jul 09 '25 07:07 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14114

blueorangutan avatar Jul 09 '25 09:07 blueorangutan

@blueorangutan test ol8 vmware-70u3 keepEnv

DaanHoogland avatar Jul 09 '25 12:07 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

blueorangutan avatar Jul 09 '25 12:07 blueorangutan

[SF] Trillian test result (tid-13741) Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 120438 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10454-t13741-vmware-70u3.zip Smoke tests completed. 138 look OK, 3 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_prepare_and_cancel_maintenance Error 0.26 test_ms_maintenance_and_safe_shutdown.py
test_01_deploy_vm_on_specific_host Error 21.92 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 3603.06 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 1.40 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 2.46 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 1.32 test_vm_deployment_planner.py
test_01_migrate_vm_strict_tags_success Error 3604.35 test_vm_strict_host_tags.py
test_02_migrate_vm_strict_tags_failure Error 5.93 test_vm_strict_host_tags.py
test_01_restore_vm_strict_tags_success Error 17.05 test_vm_strict_host_tags.py
test_02_restore_vm_strict_tags_failure Error 3604.39 test_vm_strict_host_tags.py
test_01_scale_vm_strict_tags_success Error 23.34 test_vm_strict_host_tags.py
test_02_scale_vm_strict_tags_failure Error 3603.60 test_vm_strict_host_tags.py
test_01_deploy_vm_on_specific_host_without_strict_tags Error 12.95 test_vm_strict_host_tags.py
test_02_deploy_vm_on_any_host_without_strict_tags Error 3606.17 test_vm_strict_host_tags.py
test_03_deploy_vm_on_specific_host_with_strict_tags_success Error 3.88 test_vm_strict_host_tags.py
test_04_deploy_vm_on_any_host_with_strict_tags_success Error 3606.40 test_vm_strict_host_tags.py

blueorangutan avatar Jul 10 '25 23:07 blueorangutan

I don’t think the regression failures are related, but thourough testing is required I’d say.

DaanHoogland avatar Jul 14 '25 06:07 DaanHoogland