cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Fix resource count discrepancies

Open vishesh92 opened this issue 1 year ago • 74 comments

Description

Needs some rework which will be done after https://github.com/apache/cloudstack/pull/8362/ is merged.

This PR fixes the resource count discrepancies which happen when resource count is being incremented or decremented and recalculation of resource count happens at the same time.

Requires 2 Management servers to reproduce

  1. On MS1, Add a debugger at https://github.com/apache/cloudstack/blob/724394682c73d3aaa7991ab899c97c2c3dcbbb63/server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java#L889
  2. Deploy a VM.
  3. When the debugger stops at above line, execute cmk update resourcecount domainid=`1on MS2 to trigger recalculation of resource count (this also happens periodically. cmk command triggers the same method on demand). cmk command will get blocked because of the debugger.
  4. Resume the debugger.
  5. cmk command will complete and you will see the discrepancy error in logs.

You will see a log line with the following text

Discrepency in the resource count has been detected (original count = 1 correct count = 2) for Type = user_vm for Domain ID = 2 is fixed during resource count recalculation

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

vishesh92 avatar Dec 05 '23 09:12 vishesh92

@blueorangutan package

vishesh92 avatar Dec 05 '23 09:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 05 '23 09:12 blueorangutan

Codecov Report

Attention: Patch coverage is 75.86207% with 49 lines in your changes are missing coverage. Please review.

Project coverage is 30.92%. Comparing base (6dc3d06) to head (7dcc999).

Files Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 62.96% 9 Missing and 1 partial :warning:
.../cloud/resourcelimit/ResourceLimitManagerImpl.java 87.09% 1 Missing and 7 partials :warning:
...n/java/com/cloud/vm/VirtualMachineManagerImpl.java 61.11% 4 Missing and 3 partials :warning:
...va/com/cloud/resourcelimit/CheckedReservation.java 73.07% 5 Missing and 2 partials :warning:
...main/java/com/cloud/storage/dao/VolumeDaoImpl.java 64.70% 4 Missing and 2 partials :warning:
...cloudstack/reservation/dao/ReservationDaoImpl.java 83.78% 6 Missing :warning:
.../src/main/java/com/cloud/vm/dao/UserVmDaoImpl.java 72.72% 1 Missing and 2 partials :warning:
...g/apache/cloudstack/reservation/ReservationVO.java 60.00% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #8302       +/-   ##
=============================================
+ Coverage     15.88%   30.92%   +15.04%     
- Complexity    15718    33688    +17970     
=============================================
  Files          5172     5397      +225     
  Lines        364426   379499    +15073     
  Branches      53574    55373     +1799     
=============================================
+ Hits          57874   117354    +59480     
+ Misses       299648   246565    -53083     
- Partials       6904    15580     +8676     
Flag Coverage Δ
simulator-marvin-tests 24.41% <71.92%> (?)
uitests 4.34% <ø> (ø)
unit-tests 16.88% <41.87%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 05 '23 09:12 codecov[bot]

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7924

blueorangutan avatar Dec 05 '23 09:12 blueorangutan

@blueorangutan package

vishesh92 avatar Dec 05 '23 12:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 05 '23 12:12 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7928

blueorangutan avatar Dec 05 '23 13:12 blueorangutan

@blueorangutan package

vishesh92 avatar Dec 06 '23 06:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 06 '23 06:12 blueorangutan

@vishesh92 does it affect only 4.18, or we're aiming to fix in main/4.19+ ?

rohityadavcloud avatar Dec 06 '23 07:12 rohityadavcloud

@vishesh92 does it affect only 4.18, or we're aiming to fix in main/4.19+ ?

@rohityadavcloud Both 4.18 & main are affected. This requires a migration which will make the upgrade path a little complex with 4.18.2. So, I have raised this PR against main/4.19.

vishesh92 avatar Dec 06 '23 08:12 vishesh92

@blueorangutan package

vishesh92 avatar Dec 11 '23 04:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 11 '23 04:12 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8012

blueorangutan avatar Dec 11 '23 05:12 blueorangutan

@blueorangutan test

vishesh92 avatar Dec 11 '23 05:12 vishesh92

@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Dec 11 '23 05:12 blueorangutan

[SF] Trillian test result (tid-8547) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48048 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8302-t8547-kvm-centos7.zip Smoke tests completed. 116 look OK, 5 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_query_async_job_result Error 12.38 test_async_job.py
ContextSuite context=TestVolumeUsage>:setup Error 147.16 test_usage.py
ContextSuite context=TestDeployVirtioSCSIVM>:setup Error 0.00 test_deploy_virtio_scsi_vm.py
test_deploy_more_vms_than_limit_allows Error 2.42 test_deploy_vms_in_parallel.py
test_01_scale_up_verify Failure 35.02 test_vm_autoscaling.py
test_02_update_vmprofile_and_vmgroup Failure 245.48 test_vm_autoscaling.py
test_03_scale_down_verify Failure 304.44 test_vm_autoscaling.py
test_04_stop_remove_vm_in_vmgroup Failure 0.01 test_vm_autoscaling.py

blueorangutan avatar Dec 11 '23 19:12 blueorangutan

@blueorangutan package

vishesh92 avatar Dec 12 '23 06:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 12 '23 06:12 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8020

blueorangutan avatar Dec 12 '23 07:12 blueorangutan

@blueorangutan test

vishesh92 avatar Dec 12 '23 07:12 vishesh92

@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Dec 12 '23 07:12 blueorangutan

[SF] Trillian test result (tid-8551) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 42575 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8302-t8551-kvm-centos7.zip Smoke tests completed. 121 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Dec 12 '23 19:12 blueorangutan

@blueorangutan package

vishesh92 avatar Dec 14 '23 12:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 14 '23 12:12 blueorangutan

@blueorangutan package

vishesh92 avatar Dec 14 '23 19:12 vishesh92

@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Dec 14 '23 19:12 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8060

blueorangutan avatar Dec 14 '23 20:12 blueorangutan

@blueorangutan test

vishesh92 avatar Dec 15 '23 05:12 vishesh92

@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Dec 15 '23 05:12 blueorangutan