cloudstack
cloudstack copied to clipboard
Fix resource count discrepancies
Description
Needs some rework which will be done after https://github.com/apache/cloudstack/pull/8362/ is merged.
This PR fixes the resource count discrepancies which happen when resource count is being incremented or decremented and recalculation of resource count happens at the same time.
Requires 2 Management servers to reproduce
- On MS1, Add a debugger at https://github.com/apache/cloudstack/blob/724394682c73d3aaa7991ab899c97c2c3dcbbb63/server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java#L889
- Deploy a VM.
- When the debugger stops at above line, execute cmk update resourcecount domainid=`1on MS2 to trigger recalculation of resource count (this also happens periodically. cmk command triggers the same method on demand). cmk command will get blocked because of the debugger.
- Resume the debugger.
- cmk command will complete and you will see the discrepancy error in logs.
You will see a log line with the following text
Discrepency in the resource count has been detected (original count = 1 correct count = 2) for Type = user_vm for Domain ID = 2 is fixed during resource count recalculation
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Codecov Report
Attention: Patch coverage is 75.86207% with 49 lines in your changes are missing coverage. Please review.
Project coverage is 30.92%. Comparing base (
6dc3d06) to head (7dcc999).
Additional details and impacted files
@@ Coverage Diff @@
## main #8302 +/- ##
=============================================
+ Coverage 15.88% 30.92% +15.04%
- Complexity 15718 33688 +17970
=============================================
Files 5172 5397 +225
Lines 364426 379499 +15073
Branches 53574 55373 +1799
=============================================
+ Hits 57874 117354 +59480
+ Misses 299648 246565 -53083
- Partials 6904 15580 +8676
| Flag | Coverage Δ | |
|---|---|---|
| simulator-marvin-tests | 24.41% <71.92%> (?) |
|
| uitests | 4.34% <ø> (ø) |
|
| unit-tests | 16.88% <41.87%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 7924
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7928
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
@vishesh92 does it affect only 4.18, or we're aiming to fix in main/4.19+ ?
@vishesh92 does it affect only 4.18, or we're aiming to fix in main/4.19+ ?
@rohityadavcloud Both 4.18 & main are affected. This requires a migration which will make the upgrade path a little complex with 4.18.2. So, I have raised this PR against main/4.19.
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8012
@blueorangutan test
@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests
[SF] Trillian test result (tid-8547) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48048 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8302-t8547-kvm-centos7.zip Smoke tests completed. 116 look OK, 5 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|---|---|---|
| test_query_async_job_result | Error |
12.38 | test_async_job.py |
| ContextSuite context=TestVolumeUsage>:setup | Error |
147.16 | test_usage.py |
| ContextSuite context=TestDeployVirtioSCSIVM>:setup | Error |
0.00 | test_deploy_virtio_scsi_vm.py |
| test_deploy_more_vms_than_limit_allows | Error |
2.42 | test_deploy_vms_in_parallel.py |
| test_01_scale_up_verify | Failure |
35.02 | test_vm_autoscaling.py |
| test_02_update_vmprofile_and_vmgroup | Failure |
245.48 | test_vm_autoscaling.py |
| test_03_scale_down_verify | Failure |
304.44 | test_vm_autoscaling.py |
| test_04_stop_remove_vm_in_vmgroup | Failure |
0.01 | test_vm_autoscaling.py |
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8020
@blueorangutan test
@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests
[SF] Trillian test result (tid-8551) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 42575 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8302-t8551-kvm-centos7.zip Smoke tests completed. 121 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:
| Test | Result | Time (s) | Test File |
|---|
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
@blueorangutan package
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8060
@blueorangutan test
@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests