cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Update `vpc.max.networks` setting & settings to limit the number of NICs for each hypervisor

Open hsato03 opened this issue 1 year ago • 52 comments
trafficstars

Description

Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

This PR updates the vpc.max.networks setting to ConfigKey and changes its scope to the cluster level, as the maximum number of networks per VPC depends on the hypervisor where the VR is deployed.

The setting value is defined based on the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters. If the VPC does not have a VR, the value defined in the global setting is used.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [X] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [X] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I changed the vpc.max.networks value in the following resources:

  • Global setting: 3;
  • cluster-test: 7;
  • cluster-test2: 5.

I created 3 VPCs:

  • VPC1: without VRs;
  • VPC2: VR within cluster-test;
  • VPC3: VRs within cluster-test and cluster-test2.

Then, I verfiied that the vpc.max.networks setting value was:

  • 3 in VPC1;
  • 7 in VPC2;
  • 5 in VPC3.

hsato03 avatar Feb 14 '24 18:02 hsato03

Codecov Report

Attention: Patch coverage is 88.73239% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 30.83%. Comparing base (12f65fb) to head (63ab94a). Report is 5 commits behind head on main.

Files Patch % Lines
...ain/java/com/cloud/network/vpc/VpcManagerImpl.java 72.72% 4 Missing and 2 partials :warning:
...tack/engine/orchestration/NetworkOrchestrator.java 94.44% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8654      +/-   ##
============================================
- Coverage     31.04%   30.83%   -0.21%     
+ Complexity    33902    33647     -255     
============================================
  Files          5404     5405       +1     
  Lines        380305   380371      +66     
  Branches      55506    55519      +13     
============================================
- Hits         118056   117298     -758     
- Misses       246496   247468     +972     
+ Partials      15753    15605     -148     
Flag Coverage Δ
simulator-marvin-tests 24.42% <60.56%> (-0.35%) :arrow_down:
uitests 4.34% <ø> (ø)
unit-tests 16.89% <76.05%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 14 '24 19:02 codecov[bot]

@hsato03

I agree this setting should by dynamic, and it is better to be a zone-level setting. But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters. IMHO, It does not make sense to be a cluster-level setting.

weizhouapache avatar Feb 14 '24 19:02 weizhouapache

@weizhouapache

I agree this setting should by dynamic, and it is better to be a zone-level setting.

Since the maximum number of networks per VPC depends on the hypervisor where the VR is deployed (as pointed out in the description), I don't think it makes sense to change it to the zone level as the cluster is the highest possible structure to define a hypervisor.

But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters.

The setting value is defined by the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters (https://github.com/apache/cloudstack/pull/8654/files#diff-07bd71d9f58832d4429d7743f4887188a96aacc913dc48e0101470147ce42032R1893-R1922). Regarding and vms in vpc might run with different hypervisors, VMs running in different hypervisors do not impact in the VPC creation; therefore, does not make sense to consider it in the algorithm.

hsato03 avatar Feb 14 '24 21:02 hsato03

@weizhouapache

I agree this setting should by dynamic, and it is better to be a zone-level setting.

Since the maximum number of networks per VPC depends on the hypervisor where the VR is deployed (as pointed out in the description), I don't think it makes sense to change it to the zone level as the cluster is the highest possible structure to define a hypervisor.

But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters.

The setting value is defined by the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters (https://github.com/apache/cloudstack/pull/8654/files#diff-07bd71d9f58832d4429d7743f4887188a96aacc913dc48e0101470147ce42032R1893-R1922). Regarding and vms in vpc might run with different hypervisors, VMs running in different hypervisors do not impact in the VPC creation; therefore, does not make sense to consider it in the algorithm.

I can understand your code. Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

weizhouapache avatar Feb 14 '24 21:02 weizhouapache

I can understand your code. Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

@weizhouapache Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

Therefore, in an environment with KVM and VMware clusters under the same zone, applying the VMware limit to KVM is not interesting, as a VPC in KVM supports way more tiers than in VMware.

I will update the PR's description to make it clearer.

hsato03 avatar Feb 15 '24 21:02 hsato03

I can understand your code. Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

@weizhouapache Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

Therefore, in an environment with KVM and VMware clusters under the same zone, applying the VMware limit to KVM is not interesting, as a VPC in KVM supports way more tiers than in VMware.

I will update the PR's description to make it clearer.

@hsato03 makes perfect sense.

However ... the number of nics of VPC VRs contains

  • 1 control NIC
  • multiple guest NICs
  • multiple public NICs

I would suggest you to create a setting for each hypervisor, e.g. virtual.machine.max.nics.kvm and virtual.machine.max.nics.vmware which consider all type of NICs, and check the value when

  • add guest NIC to VR (of VPC and isolated networks, and even Shared networks)
  • add public NICs to VR (of VPC and isolated networks, and even Shared networks)
  • add NICs to user VMs

weizhouapache avatar Feb 16 '24 10:02 weizhouapache

@hsato03 What do you think? The limitations apply on all vms, including user vms and VRs.

@hsato03 makes perfect sense.

However ... the number of nics of VPC VRs contains

  • 1 control NIC
  • multiple guest NICs
  • multiple public NICs

I would suggest you to create a setting for each hypervisor, e.g. virtual.machine.max.nics.kvm and virtual.machine.max.nics.vmware which consider all type of NICs, and check the value when

  • add guest NIC to VR (of VPC and isolated networks, and even Shared networks)
  • add public NICs to VR (of VPC and isolated networks, and even Shared networks)
  • add NICs to user VMs

weizhouapache avatar Feb 17 '24 08:02 weizhouapache

@weizhouapache Thanks for your suggestion.

I agree that this situation should include VMs and VRs but the vpc.max.networks setting should still exist if the user wants to limit the number of tiers in a VPC regardless of the hypervisor. What do you think about delegating the limit checking of a hypervisor based on the settings you suggested and changing the scope of the vpc.max.networks to the account level in this case?

hsato03 avatar Feb 21 '24 20:02 hsato03

@weizhouapache Thanks for your suggestion.

I agree that this situation should include VMs and VRs but the vpc.max.networks setting should still exist if the user wants to limit the number of tiers in a VPC regardless of the hypervisor.

agree.

What do you think about delegating the limit checking of a hypervisor based on the settings you suggested and changing the scope of the vpc.max.networks to the account level in this case?

It looks good. Looking forward to your changes

weizhouapache avatar Feb 21 '24 21:02 weizhouapache

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Mar 08 '24 10:03 github-actions[bot]

@blueorangutan package

DaanHoogland avatar Mar 29 '24 15:03 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Mar 29 '24 15:03 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9096

blueorangutan avatar Mar 29 '24 17:03 blueorangutan

@blueorangutan test alma9 kvm-alma9

DaanHoogland avatar Apr 02 '24 09:04 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests

blueorangutan avatar Apr 02 '24 09:04 blueorangutan

[SF] Trillian test result (tid-9654) Environment: kvm-alma9 (x2), Advanced Networking with Mgmt server a9 Total time taken: 56484 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8654-t9654-kvm-alma9.zip Smoke tests completed. 128 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_disableHumanReadableLogs Error 0.19 test_human_readable_logs.py
test_02_enableHumanReadableLogs Error 0.14 test_human_readable_logs.py

blueorangutan avatar Apr 03 '24 01:04 blueorangutan

@blueorangutan test

DaanHoogland avatar Apr 03 '24 08:04 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Apr 03 '24 08:04 blueorangutan

[SF] Trillian test result (tid-9680) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 46504 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8654-t9680-kvm-centos7.zip Smoke tests completed. 129 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Apr 03 '24 21:04 blueorangutan

@weizhouapache is this good for your sake now?

DaanHoogland avatar Apr 04 '24 07:04 DaanHoogland

Codecov Report

Attention: Patch coverage is 76.59574% with 33 lines in your changes missing coverage. Please review.

Project coverage is 15.55%. Comparing base (cea4801) to head (bf70339).

Files Patch % Lines
...m/cloud/api/query/dao/DomainRouterJoinDaoImpl.java 0.00% 12 Missing :warning:
...ain/java/com/cloud/network/vpc/VpcManagerImpl.java 85.71% 7 Missing and 2 partials :warning:
...tack/engine/orchestration/NetworkOrchestrator.java 88.23% 5 Missing and 1 partial :warning:
...ain/java/com/cloud/vm/dao/DomainRouterDaoImpl.java 45.45% 6 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8654      +/-   ##
============================================
+ Coverage     15.53%   15.55%   +0.02%     
- Complexity    11967    11996      +29     
============================================
  Files          5492     5493       +1     
  Lines        480934   481050     +116     
  Branches      60876    59475    -1401     
============================================
+ Hits          74711    74827     +116     
  Misses       397962   397962              
  Partials       8261     8261              
Flag Coverage Δ
uitests 4.17% <ø> (ø)
unittests 16.32% <76.59%> (+0.02%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Apr 09 '24 21:04 codecov-commenter

@hsato03 it is a bit hidden in the output but please check https://github.com/apache/cloudstack/actions/runs/8622615987/job/23634113585?pr=8654#step:7:8601 .

DaanHoogland avatar Apr 10 '24 09:04 DaanHoogland

overall code lgtm

@hsato03 could you add a integration test for it ? the component test test_vpc_vms_deployment.py contains the tests on 'vpc.max.networks' but it looks not run in simulator CI and smoke test.

weizhouapache avatar Apr 12 '24 12:04 weizhouapache

@hsato03 , should those extra integration tests be configured in github actions ? (or be added/moved to smoke tests)

DaanHoogland avatar Apr 23 '24 10:04 DaanHoogland

@DaanHoogland I don't think so, as the tests file is already declared in github actions.

hsato03 avatar Apr 23 '24 20:04 hsato03

@DaanHoogland I don't think so, as the tests file is already declared in github actions.

My bad, I thought they were in a new file (hastely looking on my part ;) Could you maybe put them in a separate file and then add them? I think that makes the organisation of tests a bit easier. (and then my prior comment would be valid)

DaanHoogland avatar Apr 24 '24 07:04 DaanHoogland

@DaanHoogland I don't think so, as the tests file is already declared in github actions.

My bad, I thought they were in a new file (hastely looking on my part ;) Could you maybe put them in a separate file and then add them? I think that makes the organisation of tests a bit easier. (and then my prior comment would be valid)

Sure. I moved the tests to smoke tests in a new file.

hsato03 avatar May 02 '24 20:05 hsato03

@blueorangutan package

DaanHoogland avatar May 03 '24 06:05 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar May 03 '24 07:05 blueorangutan

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 9524

blueorangutan avatar May 03 '24 08:05 blueorangutan