cloudstack Do not retrieve VM's stats on normal VM listing

Description

For both the listVirtualMachines and listVirtualMachineMetrics APIs, by default, ACS queries all VM details (the details parameter is default all) and consequently queries the cloud.vm_stats table for each VM, creating a summary of statistics. Therefore, every process carried out in the UI that uses the listVirtualMachines API also loads the VM statistics, which can cause slowdowns depending on the number of records.

A new configuration return.vm.stats.on.vm.list (with default true), was added, when it is false, the listVirtualMachines API will not list the VM's metrics by default.

Types of changes

[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] New feature (non-breaking change which adds functionality)
[ ] Bug fix (non-breaking change which fixes an issue)
[X] Enhancement (improves an existing feature and functionality)
[ ] Cleanup (Code refactoring and cleanup, that may add test cases)
[ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

[X] Major
[ ] Minor

Bug Severity

[ ] BLOCKER
[ ] Critical
[ ] Major
[ ] Minor
[ ] Trivial

How Has This Been Tested?

Called listVirtualMachines API, without any parameters:

before this PR, the all the VM's details were returned;
with this PR and return.vm.stats.on.vm.list as true (default), all the details were returned as well;
with this PR and return.vm.stats.on.vm.list as false, all the details except the metrics were returned.

Mar 13 '24 18:03 JoaoJandre

Codecov Report

Attention: Patch coverage is 47.82609% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 30.93%. Comparing base (a7ec873) to head (c711a39). Report is 7 commits behind head on 4.19.

Files	Patch %	Lines
...che/cloudstack/api/command/user/vm/ListVMsCmd.java	41.66%	6 Missing and 1 partial :warning:
...a/org/apache/cloudstack/api/ListVMsMetricsCmd.java	33.33%	1 Missing and 1 partial :warning:
...apache/cloudstack/api/response/UserVmResponse.java	80.00%	1 Missing :warning:
ui/src/config/section/compute.js	0.00%	1 Missing :warning:
ui/src/views/AutogenView.vue	0.00%	0 Missing and 1 partial :warning:

Additional details and impacted files

@@            Coverage Diff            @@
##               4.19    #8782   +/-   ##
=========================================
  Coverage     30.93%   30.93%           
- Complexity    34263    34276   +13     
=========================================
  Files          5353     5354    +1     
  Lines        376055   376100   +45     
  Branches      54691    54697    +6     
=========================================
+ Hits         116317   116346   +29     
- Misses       244443   244453   +10     
- Partials      15295    15301    +6

Flag	Coverage Δ
simulator-marvin-tests	`24.76% <52.38%> (+<0.01%)`	:arrow_up:
uitests	`4.39% <0.00%> (-0.01%)`	:arrow_down:
unit-tests	`16.58% <0.00%> (-0.01%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 13 '24 18:03 codecov[bot]

@JoaoJandre I suggest to check/fix the API calls from UI to list vms or vm metrics.

It is not good to change the default behavior if it is not a bug, IMHO.

Mar 13 '24 18:03 weizhouapache

For both the listVirtualMachines and listVirtualMachineMetrics APIs, by default, ACS queries all VM details (the details parameter is default all) and consequently queries the cloud.vm_stats table for each VM,

This is because the listVirtualMachinesMetrics API (and all Metrics API) are superset of the non-metrics APIs. All the Metrics APIs add something related to metrics in their API response in addition to what a non-metrics API would return. I think we shouldn't change the default behaviour of the non-metrics API - and if you really have a use-case for this see if there's any other way to do what you're trying to accomplish or worst-case add a global setting that allows the behaviour you want but keep the setting's default in such a way that it continue old behaviour for other users.

Mar 14 '24 12:03 rohityadavcloud

@JoaoJandre I suggest to check/fix the API calls from UI to list vms or vm metrics.

It is not good to change the default behavior if it is not a bug, IMHO.

@weizhouapache, @sureshanaparti and @rohityadavcloud, we could change only the UI behavior; however, why do we have this default behavior? We do not use these metrics that are returned on this API, when the metrics are used, users, operators and the UI use the listVirtualMachineMetrics API. That is, we already have an API that has this exact purpose.

Not long ago we changed the default behavior of the deleteTemplate API on #7731; As with this situation, the one on #7731 wasn't a bug, it was an inconvenient behavior. A user/operator that only wants to list the VMs intuitively will use the listVirtualMachines API; however, depending on the environment's size, he will deal with slowness because of the default behavior. As with #7731, we could only a change the UI, but changing the default behavior makes more sense.

Mar 14 '24 13:03 JoaoJandre

Codecov Report

Attention: Patch coverage is 3.22581% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 14.96%. Comparing base (a7ec873) to head (e0f7462). Report is 109 commits behind head on 4.19.

Files	Patch %	Lines
...che/cloudstack/api/command/user/vm/ListVMsCmd.java	0.00%	17 Missing :warning:
...apache/cloudstack/api/response/UserVmResponse.java	0.00%	8 Missing :warning:
...a/org/apache/cloudstack/api/ListVMsMetricsCmd.java	0.00%	4 Missing :warning:
...n/java/com/cloud/api/query/ViewResponseHelper.java	0.00%	1 Missing :warning:

Additional details and impacted files

@@              Coverage Diff              @@
##               4.19    #8782       +/-   ##
=============================================
- Coverage     30.93%   14.96%   -15.97%     
+ Complexity    34263    10989    -23274     
=============================================
  Files          5353     5373       +20     
  Lines        376055   469034    +92979     
  Branches      54691    57597     +2906     
=============================================
- Hits         116317    70177    -46140     
- Misses       244443   391087   +146644     
+ Partials      15295     7770     -7525

Flag	Coverage Δ
simulator-marvin-tests	`?`
uitests	`4.31% <ø> (-0.08%)`	:arrow_down:
unit-tests	`?`
unittests	`15.66% <3.22%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 03 '24 15:05 codecov-commenter

@blueorangutan package

May 06 '24 17:05 JoaoJandre

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

May 06 '24 17:05 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9549

May 06 '24 18:05 blueorangutan

@rohityadavcloud , @weizhouapache I added a configuration to control the behavior of the listVirtualMachines API and updated the description. Now the default behavior is maintained. I hope we can advance the discussion on #8970 so we can create a proper mechanism to change the default behavior in the future.

May 06 '24 18:05 JoaoJandre

@blueorangutan test rocky8 kvm-rocky8

May 06 '24 18:05 weizhouapache

@weizhouapache a [SL] Trillian-Jenkins test job (rocky8 mgmt + kvm-rocky8) has been kicked to run smoke tests

May 06 '24 18:05 blueorangutan

@rohityadavcloud , @weizhouapache I added a configuration to control the behavior of the listVirtualMachines API and updated the description. Now the default behavior is maintained. I hope we can advance the discussion on #8970 so we can create a proper mechanism to change the default behavior in the future.

thanks @JoaoJandre for the update

May 06 '24 18:05 weizhouapache

[SF] Trillian test result (tid-10170) Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8 Total time taken: 49325 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8782-t10170-kvm-rocky8.zip Smoke tests completed. 129 look OK, 2 have errors, 0 did not run Only failed and skipped tests results shown below:

Test	Result	Time (s)	Test File
test_01_events_resource	`Error`	418.29	test_events_resource.py
test_01_restore_vm	`Error`	0.24	test_restore_vm.py
test_02_restore_vm_allocated_root	`Error`	0.18	test_restore_vm.py
ContextSuite context=TestRestoreVM>:teardown	`Error`	1.26	test_restore_vm.py

May 07 '24 09:05 blueorangutan

I like the idea of splitting which API is called for metrics vs non-metrics list view (I might or others should steal the pattern for all metrics API usage across the UI). I'm not fully satisfied with the PR @JoaoJandre yet, and I would rather encourage you can pick some ideas from #8985 - that said, I'm also inclined to make progress in a cordial and mature manner. It would be easier to get this merge and optimise the general solution further as required (by me or others).

I'm working on a much wider PR that's in research and progress, that's around wider scalability issues of CloudStack (surprisingly I'm near the root cause, and it maybe possible to even get stats without much penalty). All that said - I wouldn't remember everything I write on Github on each and every PR, take my review with a pinch of salt and I may change my views on things as I'm dealing with a wider scalability problem. I'll leave some comments, but think let's go ahead.

May 08 '24 15:05 rohityadavcloud

@sureshanaparti ok by you like this?

May 18 '24 19:05 DaanHoogland

@blueorangutan package

May 18 '24 19:05 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

May 18 '24 19:05 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 9631

May 18 '24 20:05 blueorangutan

ping @sureshanaparti

May 22 '24 08:05 DaanHoogland

@blueorangutan package

May 22 '24 08:05 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

May 22 '24 08:05 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9663

May 22 '24 09:05 blueorangutan

@blueorangutan test

May 22 '24 12:05 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

May 22 '24 12:05 blueorangutan

[SF] Trillian test result (tid-10254) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 43718 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8782-t10254-kvm-centos7.zip Smoke tests completed. 130 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test	Result	Time (s)	Test File
test_01_events_resource	`Error`	416.93	test_events_resource.py

May 23 '24 06:05 blueorangutan

cloudstack cloudstack copied to clipboard

Do not retrieve VM's stats on normal VM listing

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

How Has This Been Tested?

Codecov Report

Codecov Report

cloudstack
cloudstack copied to clipboard