cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Fix infrastructure leak on exception while attaching/detaching volumes in VMware

Open erikbocks opened this issue 7 months ago • 14 comments

Description

In VMware environments, when a VM resides on a host in the Disconnected state, and an attach/detach volume operation is initiated, an exception containing infrastructure data is thrown. This PR addresses the issue by handling the AgentUnavailableException separately. The exception will still appear in the application logs, allowing operators to troubleshoot effectively.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [X] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI
  • [ ] test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [X] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [X] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I made the following tests, in my local lab:

  1. Created a new VM and attached a volume to it.
  2. Shutdown my VMware host.
  3. Tried to attach a new volume, and the exception containing the infrastructure data was thrown.
  4. Tried to detach the previously attached volume, and the same exception was thrown.
  5. Built and installed CloudStack's packages with my fix.
  6. Repeated the same processes, and validated that the new error message contained no infrastructure data.

erikbocks avatar May 13 '25 14:05 erikbocks

Codecov Report

:x: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 16.57%. Comparing base (b57994e) to head (9443d2e). :warning: Report is 134 commits behind head on main.

Files with missing lines Patch % Lines
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 0.00% 6 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10860      +/-   ##
============================================
- Coverage     16.57%   16.57%   -0.01%     
+ Complexity    14059    14056       -3     
============================================
  Files          5772     5772              
  Lines        512938   512944       +6     
  Branches      62304    62305       +1     
============================================
- Hits          85026    85020       -6     
- Misses       418431   418442      +11     
- Partials       9481     9482       +1     
Flag Coverage Δ
uitests 3.89% <ø> (ø)
unittests 17.47% <0.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 13 '25 16:05 codecov[bot]

Thank you for the review @sureshanaparti.

erikbocks avatar Jun 05 '25 16:06 erikbocks

@blueorangutan package

DaanHoogland avatar Jun 09 '25 14:06 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 09 '25 14:06 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13675

blueorangutan avatar Jun 09 '25 15:06 blueorangutan

@blueorangutan test

DaanHoogland avatar Jun 11 '25 17:06 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Jun 11 '25 17:06 blueorangutan

[SF] Trillian test result (tid-13500) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 89073 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10860-t13500-kvm-ol8.zip Smoke tests completed. 130 look OK, 11 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_nic_secondaryip_add_remove Error 1518.41 test_multipleips_per_nic.py
ContextSuite context=TestNestedVirtualization>:setup Error 0.00 test_nested_virtualization.py
ContextSuite context=TestNetworkACL>:setup Error 0.00 test_network_acl.py
ContextSuite context=TestIpv6Network>:setup Error 0.00 test_network_ipv6.py
test_delete_account Error 1517.39 test_network.py
test_delete_network_while_vm_on_it Error 1.26 test_network.py
test_deploy_vm_l2network Error 1.20 test_network.py
test_l2network_restart Error 2.35 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 3.59 test_network.py
ContextSuite context=TestPublicIP>:setup Error 12.44 test_network.py
test_reboot_router Failure 0.09 test_network.py
test_releaseIP Error 6.53 test_network.py
test_releaseIP_using_IP Error 6.02 test_network.py
ContextSuite context=TestRouterRules>:setup Error 6.11 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1521.96 test_network.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
ContextSuite context=TestAdapterTypeForNic>:setup Error 0.00 test_nic_adapter_type.py
ContextSuite context=TestNonStrictAffinityGroups>:setup Error 0.00 test_nonstrict_affinity_group.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
ContextSuite context=TestPortForwardingRules>:setup Error 0.00 test_portforwardingrules.py
ContextSuite context=TestProjectSuspendActivate>:setup Error 1529.70 test_projects.py

blueorangutan avatar Jun 12 '25 19:06 blueorangutan

@blueorangutan test ol8 vmware-80u3

DaanHoogland avatar Jun 13 '25 17:06 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u3) has been kicked to run smoke tests

blueorangutan avatar Jun 13 '25 17:06 blueorangutan

[SF] Trillian Build Failed (tid-13517)

blueorangutan avatar Jun 13 '25 18:06 blueorangutan

@blueorangutan test ol8 vmware-70u3

DaanHoogland avatar Jun 13 '25 18:06 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

blueorangutan avatar Jun 13 '25 18:06 blueorangutan

[SF] Trillian Build Failed (tid-13521)

blueorangutan avatar Jun 13 '25 19:06 blueorangutan

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Jul 16 '25 07:07 github-actions[bot]

@blueorangutan package

DaanHoogland avatar Aug 04 '25 08:08 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Aug 04 '25 08:08 blueorangutan

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14534

blueorangutan avatar Aug 04 '25 10:08 blueorangutan

@blueorangutan test

DaanHoogland avatar Aug 04 '25 10:08 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

blueorangutan avatar Aug 04 '25 10:08 blueorangutan

@sureshanaparti , I am not sure if smoke tests are going to give us any extra data above @rosi-shapeblue ’s testing so I think we can merge...

DaanHoogland avatar Aug 04 '25 13:08 DaanHoogland

[SF] Trillian test result (tid-14023) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 51234 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10860-t14023-kvm-ol8.zip Smoke tests completed. 146 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Aug 05 '25 01:08 blueorangutan