cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

Support for configurable Network Interfaces

Open brianlieberman opened this issue 3 years ago • 11 comments

/kind feature

What this PR does / why we need it: This PR adds the ability to configure one or more NetworkInterfaces for AzureMachines and AzureMachinePools, as well as pre-warming multiple IPConfigs on each interface.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #2327

TODOs:

  • [ x] squashed commits
  • [ x] includes documentation
  • [ x] adds unit tests

Release note:

Adds additional fields for AzureMachine, AzureMachineTemplate, and AzureMachinePool to configure multiple NetworkInterfaces. An example configuration:
networkInterfaces:
        - subnetName: control-plane-subnet
           acceleratedNetworking: false
         - subnetName: node-subnet
           acceleratedNetworking: true
           privateIPConfigs: 2
           publicIPConfigs: 1

will create two interfaces, with the node-subnet interface having two additional private IP addresses, and one additional public IP address

brianlieberman avatar Jun 22 '22 20:06 brianlieberman

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: brianlieberman / name: Brian Lieberman (a9b3b46bf80b6d9816be6839659078f4a1197fc0, c19ddea1f45555d65d07b38b620be7c8c4660d60, 6e20ef6cdef6acbb667ed04b77b6991349154320, 2a6e944011c35f06d2537b748bca8244f6d6400d, 9afbebeabd2a26fd701dd6b50e0a3bd2dfff6fc0, 3e5a2f6357091fc626f20901eb09aa472fccdbc8, 7bd6b242796455ae1ec0e97db6572ccae73998ce, 9ab395506274f32e0cb353ff3ddbcafbe0db1ff9)
  • :white_check_mark: login: dthorsen / name: Dane Thorsen (3800b8b3bf19277eb96f9c22f3a8b768f9e76329, 6095d7b77772254793c312f2f0e44c8605c60c68, 5a45826b664ec22fbcb5868e126645336e34a3a2)

Hi @brianlieberman. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 22 '22 20:06 k8s-ci-robot

Haven't looked at the code yet, but thanks for opening this!

One question about the UX:

 ipConfigs:
  - {}
  - {}

If a user needs to pre-allocate say 250 IPs to work with Azure CNI in a 250 max pods per node configuration, how would this scale? It might make sense to allow the user to specify an IP count instead?

CecileRobertMichon avatar Jun 22 '22 21:06 CecileRobertMichon

Yeah ,as currently implemented it doesn't scale to that level well from a UX perspective, there's a tradeoff since currently you can specify a configuration for each IPConfig (publicIP, privateIP, allocatePublicIP). In the example we leave them empty to just provision a default of a dynamic privateIP, but that's the tradeoff we might make in order to make something like an int we can iterate instead. Open to implementation suggestions for improvement.

brianlieberman avatar Jun 22 '22 22:06 brianlieberman

I like the idea of simply exposing a number of ipConfigs too. Definitely makes for a nicer UX. Maybe just have a couple integers? privateIPConfigs: 30 and publicIPConfigs: 1 or something like that?

dthorsen avatar Jun 22 '22 22:06 dthorsen

Reworked the IPConfig functionality to use integer quantities and updated the example in the PR description

brianlieberman avatar Jun 24 '22 15:06 brianlieberman

/ok-to-test

invidian avatar Jul 05 '22 12:07 invidian

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: brianlieberman / name: Brian Lieberman (a9b3b46bf80b6d9816be6839659078f4a1197fc0, c19ddea1f45555d65d07b38b620be7c8c4660d60, 6e20ef6cdef6acbb667ed04b77b6991349154320, 2a6e944011c35f06d2537b748bca8244f6d6400d, 9afbebeabd2a26fd701dd6b50e0a3bd2dfff6fc0, 3e5a2f6357091fc626f20901eb09aa472fccdbc8, 7bd6b242796455ae1ec0e97db6572ccae73998ce, 9ab395506274f32e0cb353ff3ddbcafbe0db1ff9, 3685380dbcf70a2f3adf70bbf075c61ce3708c5d, 3a90933cd5fd68a2a14ce9008129ab6a76addf3f, 4ff1d2a856597ae83846ce6bbdc6788cade4aaa6, 77e3bfce47eaee17391833bf3045610560f307f4, 6567927005ea7bf0001978b4c5976cf069eff542, 6a18007d9ece16c129c2c37a5208dfe2b7ceba12, a38a8d342fa0ef7df356bc8490c500060544a3f5, a6ba4e976725db7a57f39ff88b19de67edbb9188, 24b09a80cb0410730972158b6a124aed6b11e113, 7cf1d091a6025806e1e7042a5ab00df7281e8102, ff3eae1f1388ef4fe249e402b38c57cb1b794578, 2723c892a3b334036de741684eb7b36fe319d2a8, c9022218c1570637df2bdb72f5de9ae03b65febf, d90488fbd33c3a5db52bbda4dd4ad853a06fa99b, 230d88fc660074b52fee92934a482cf46d584f33)
  • :white_check_mark: login: dthorsen / name: Dane Thorsen (3800b8b3bf19277eb96f9c22f3a8b768f9e76329, 6095d7b77772254793c312f2f0e44c8605c60c68, 5a45826b664ec22fbcb5868e126645336e34a3a2, be0b6cb818bc0bb5db8dee675fc40deef0c595f2, 05fc0d67cef5ccfbec916201778f3c0f8120fd90)
  • :white_check_mark: login: mytunguyen / name: Mytu (a362d56dab5c986dc4b74968eaba5dc872653a18, 89f0f7126561d44d159bec591571c386841fcf2e)
  • :white_check_mark: login: BrennenMM7 / name: Brennen Murray (4600f181cafd3f4fb8ba934c9142738151152933)

@brianlieberman: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 16 '22 00:08 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval by writing /assign @fabriziopandini in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 16 '22 18:08 k8s-ci-robot

@brianlieberman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 230d88fc660074b52fee92934a482cf46d584f33 link false /test pull-cluster-api-provider-azure-apidiff
pull-cluster-api-provider-azure-e2e 230d88fc660074b52fee92934a482cf46d584f33 link true /test pull-cluster-api-provider-azure-e2e
pull-cluster-api-provider-azure-ci-entrypoint 230d88fc660074b52fee92934a482cf46d584f33 link true /test pull-cluster-api-provider-azure-ci-entrypoint
pull-cluster-api-provider-azure-e2e-exp 230d88fc660074b52fee92934a482cf46d584f33 link false /test pull-cluster-api-provider-azure-e2e-exp
pull-cluster-api-provider-azure-verify 230d88fc660074b52fee92934a482cf46d584f33 link true /test pull-cluster-api-provider-azure-verify
pull-cluster-api-provider-azure-test 230d88fc660074b52fee92934a482cf46d584f33 link true /test pull-cluster-api-provider-azure-test
pull-cluster-api-provider-azure-build 230d88fc660074b52fee92934a482cf46d584f33 link true /test pull-cluster-api-provider-azure-build
pull-cluster-api-provider-azure-coverage 230d88fc660074b52fee92934a482cf46d584f33 link false /test pull-cluster-api-provider-azure-coverage

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Aug 16 '22 18:08 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval by writing /assign @fabriziopandini in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Nov 02 '22 02:11 k8s-ci-robot

@dthorsen looks like the PR still has a merge conflict and the CLA authorization is missing

CecileRobertMichon avatar Nov 02 '22 19:11 CecileRobertMichon

/lifecycle active

CecileRobertMichon avatar Nov 08 '22 22:11 CecileRobertMichon

/test pull-cluster-api-provider-azure-coverage

dthorsen avatar Nov 10 '22 13:11 dthorsen

@dthorsen: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-e2e-csi-migration
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-e2e-exp
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test pull-cluster-api-provider-azure-coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 10 '22 13:11 k8s-ci-robot

/retest

dthorsen avatar Nov 10 '22 16:11 dthorsen

I ran the e2e tests locally and they passed. Considering the timeouts I see in the prow job logs, I suspect something is flaky? Does anyone have any other insight? Should I just keep retrying that test?

dthorsen avatar Nov 10 '22 17:11 dthorsen

I ran the e2e tests locally and they passed. Considering the timeouts I see in the prow job logs, I suspect something is flaky? Does anyone have any other insight? Should I just keep retrying that test?

ipv6 cluster worker node kubeadm join failed with:

[2022-11-10 16:30:43] error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get component configs: could not download the kubelet configuration from ConfigMap "kubelet-config" or "kubelet-config-1.23": Get "https://capz-e2e-r87hcg-ipv6-3560cb8d.westus2.cloudapp.azure.com:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.23?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

source: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/2411/pull-cluster-api-provider-azure-e2e/1590739619538800640/artifacts/clusters/capz-e2e-r87hcg-ipv6/machines/capz-e2e-r87hcg-ipv6-md-0-856bdddf4c-zzk8n/cloud-init-output.log

That seems like an unrelated network flake to me but let's retry...

/retest

CecileRobertMichon avatar Nov 10 '22 22:11 CecileRobertMichon

/milestone v1.7

CecileRobertMichon avatar Dec 01 '22 16:12 CecileRobertMichon

/retest

dthorsen avatar Dec 05 '22 16:12 dthorsen

/assign

nawazkh avatar Dec 05 '22 19:12 nawazkh

/retest

dthorsen avatar Dec 06 '22 04:12 dthorsen

/retest

dthorsen avatar Dec 06 '22 16:12 dthorsen

@brianlieberman: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-coverage 230d88fc660074b52fee92934a482cf46d584f33 link false /test pull-cluster-api-provider-azure-coverage
pull-cluster-api-provider-azure-e2e d461435065779832685f816ffdd8507ed92eb672 link true /test pull-cluster-api-provider-azure-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Dec 06 '22 17:12 k8s-ci-robot

@dthorsen I do see some errors in the latest e2e run controller logs that seem related to this PR

E1206 17:04:54.815032       1 controller.go:326]  "msg"="Reconciler error" "error"="admission webhook \"validation.azuremachine.infrastructure.cluster.x-k8s.io\" denied the request: AzureMachine.infrastructure.cluster.x-k8s.io \"capz-e2e-dm3mea-ha-md-0-kqnqw\" is invalid: spec.networkInterfaces: Invalid value: \"null\": field is immutable" "AzureMachine"={"name":"capz-e2e-dm3mea-ha-md-0-kqnqw","namespace":"capz-e2e-dm3mea"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-e2e-dm3mea-ha-md-0-kqnqw" "namespace"="capz-e2e-dm3mea" "reconcileID"="752409a6-3bb6-4712-979d-6307d9e24130"

CecileRobertMichon avatar Dec 08 '22 00:12 CecileRobertMichon

/retest

dthorsen avatar Dec 19 '22 00:12 dthorsen

/retest

dthorsen avatar Dec 20 '22 15:12 dthorsen

@CecileRobertMichon The tests are finally passing :). I did my best to incorporate your feedback. The business logic in the NICSpec builder is reduced.

dthorsen avatar Dec 21 '22 16:12 dthorsen

/retest

dthorsen avatar Dec 21 '22 19:12 dthorsen

/retest

dthorsen avatar Dec 23 '22 20:12 dthorsen

@nawazkh thank you for the review 🙏

@dthorsen happy new year! please let me know once all comments have been addressed and this is ready for a final pass

CecileRobertMichon avatar Jan 03 '23 19:01 CecileRobertMichon

/retest

dthorsen avatar Jan 05 '23 23:01 dthorsen

@nawazkh are all your comments resolved?

If so @dthorsen can you please squash one more time?

CecileRobertMichon avatar Jan 06 '23 02:01 CecileRobertMichon

@nawazkh I believe I addressed all of your comments. Let me know if you have any other comments before I squash and push commits. Thanks!

dthorsen avatar Jan 06 '23 15:01 dthorsen