cluster-api-provider-azure
cluster-api-provider-azure copied to clipboard
Enable adding of Service Endpoints to subnets
What type of PR is this?
/kind feature
What this PR does / why we need it:
This implements the ability to add Service Endpoints to the subnets created/managed by CAPZ.
Which issue(s) this PR fixes:
Fixes https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/2621
TODOs:
- [x] squashed commits
- [x] includes documentation
- [x] adds unit tests
Release note:
Enables adding of Virtual Network Service Endpoints to subnets created/managed by CAPZ
Hi @mtougeron. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Current outstanding TODOs:
- [x] Write new docs on how to use this features along with limitations
- [x] Refactor the subnet reconcile function so it can update existing subnets with service endpoint changes
- [x] fix the api version conversion logic
When running the tests with the api version conversion changes I'm getting the following error and I'm not sure how to address it. If someone could help point me in the right direction I'd really appreciate it. I'm not sure if I'm overlooking something obvious or not.
--- FAIL: TestFuzzyConversion (16.31s)
--- FAIL: TestFuzzyConversion/for_AzureCluster (3.78s)
--- FAIL: TestFuzzyConversion/for_AzureCluster/hub-spoke-hub (0.00s)
conversion.go:253:
&v1beta1.AzureCluster{
TypeMeta: {},
ObjectMeta: v1.ObjectMeta{
... // 9 identical fields
DeletionGracePeriodSeconds: nil,
Labels: {},
- Annotations: nil,
+ Annotations: map[string]string{},
OwnerReferences: nil,
Finalizers: {"B)i"},
... // 2 identical fields
},
Spec: v1beta1.AzureClusterSpec{
AzureClusterClassSpec: {SubscriptionID: "嚧虩êſȏ&爲ş?ȩ暧", Location: `_瞊戸唨w镅ʜɸ撑擁\?ɂƯ`, IdentityRef: &{Kind: "撈弶瞯袮_v棈%!"(MISSING), Namespace: "ĕ鹗ʟ傠kʝT栎", Name: "ȏ鲾殳;Ɉ馲知郤俢", UID: "囹e±ɻ潒岀炜ƈ騸Ȱ1ă9ʔȣ:Ɖ", ...}, AzureEnvironment: "Ƌ烯!涧Ȋ;ũ焅bȪ曳i楁V", ...},
NetworkSpec: {Vnet: {ResourceGroup: "Ȕs閼Ɋs儼", ID: "权Dʖȭƺɥ", Name: "曠ȡIb$¡瘎&诈", Peerings: {{ResourceGroup: `俿7Ɉ隳曆"6?qʭĤ鱜霸`, VnetPeeringClassSpec: {RemoteVnetName: "9/"}}}, ...}, Subnets: {}, APIServerLB: {ID: "ŵ鋾", Name: "ƫ'惡抢(oQ;u鄣N充羽·ɝ稄ıį", FrontendIPs: {{Name: "ĘjWȊmǎ.嗎", FrontendIPClass: {PrivateIPAddress: "x{Ǚ,I<s唜Dz"}}}, LoadBalancerClassSpec: {SKU: `Ŕ湞\/w貅g允ąKȶ芀瘳`, Type: "8关+¤zȄw齴珦髋", IdleTimeoutInMinutes: &-672696477}}, NodeOutboundLB: &{ID: "鷬Ń落]Cȥƙ鷢", Name: "5%!a(MISSING)ȭ凌饨衆ɷȔ恢泂哐Ƅ裇\u00a0骀WG", FrontendIPs: {}, LoadBalancerClassSpec: {SKU: "夔.qMɃț诂禫ɣ埓ǥʂ慃rȓ±ĪNj", Type: "3t5ʥƞ穮,ɾƔdÄ"}}, ...},
ResourceGroup: "Bȓ馎@Bq辮",
BastionSpec: v1beta1.BastionSpec{
AzureBastion: &v1beta1.AzureBastion{
Name: "_ť袴T槅ɤ#葐",
Subnet: v1beta1.SubnetSpec{
... // 3 identical fields
RouteTable: {ID: "蜭XlG骾鍓[ɹĥŚȴ-ű", Name: "蹄{愡ǹ="},
NatGateway: {NatGatewayIP: {Name: "{ȲŻǷWT魜İ", DNSName: "G隸级S M肏sȔƌ«笣o"}, NatGatewayClassSpec: {Name: "'ʜȾČDž"}},
- ServiceEndpoints: v1beta1.ServiceEndpoints{{Service: "_姣"}},
+ ServiceEndpoints: nil,
SubnetClassSpec: {Role: "燌~×Ĉ菏ɚ1PÙ<qr", CIDRBlocks: {"ľu陽Rc癜Ȁ ʐ´G"}},
},
PublicIP: {Name: "騂綍Ô菦顂村", DNSName: "癊}冱B曘õʡ垑(儆进G", IPTags: {{Type: "@yjvǣǯ縵e", Tag: "fǡM苔"}}},
},
},
ControlPlaneEndpoint: {Host: "Uw嬷k", Port: 1478785315},
},
Status: {Conditions: {}, LongRunningOperationStates: {}},
}
Expected
<bool>: false
to be true
/ok-to-test
Per the slack conversation, the sticking point I'm currently running into is this make generate error. Otherwise things seem to be working.
/Users/tougeron/src/go/src/github.com/mtougeron/cluster-api-provider-azure/hack/tools/bin/conversion-gen-v0.23.1 \
--input-dirs=./exp/api/v1alpha3 \
--output-file-base=zz_generated.conversion \
--go-header-file=./hack/boilerplate/boilerplate.generatego.txt --output-base=/Users/tougeron/src/go/src/github.com/mtougeron/cluster-api-provider-azure
E0906 08:11:50.114503 66080 conversion.go:755] Warning: could not find nor generate a final Conversion function for sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1.ManagedControlPlaneSubnet -> github.com/mtougeron/cluster-api-provider-azure/exp/api/v1alpha3.ManagedControlPlaneSubnet
E0906 08:11:50.114678 66080 conversion.go:756] the following fields need manual conversion:
E0906 08:11:50.114686 66080 conversion.go:758] - ServiceEndpoints
Also need to figure out a refactoring for:
azure/services/subnets/spec.go:60:1: cyclomatic complexity 36 of func `(*SubnetSpec).Parameters` is high (> 30) (gocyclo)
func (s *SubnetSpec) Parameters(existing interface{}) (parameters interface{}, err error) {
The more detailed apidiff is
sigs.k8s.io/cluster-api-provider-azure/api/v1beta1
Compatible changes:
- ServiceEndpointSpec: added
- ServiceEndpoints: added
- SubnetSpec.ServiceEndpoints: added
sigs.k8s.io/cluster-api-provider-azure/azure/services/subnets
Compatible changes:
- SubnetSpec.ServiceEndpoints: added
sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1
Incompatible changes:
- ManagedControlPlaneSubnet: old is comparable, new is not
- ManagedControlPlaneVirtualNetwork: old is comparable, new is not
Compatible changes:
- ManagedControlPlaneSubnet.ServiceEndpoints: added
but I'm not sure why it works fine for the AzureCluster but not ManagedControlPlane.
/retest
@nojnhuh / @jackfrancis I'm not sure what the apidiff job is telling me is wrong. I know it's tell me something is wrong but I don't know how to track down what it is since the ManagedControlPlane changes are the same as the AzureCluster changes (as far as I can tell).
This is about 50/50 spitball/educated guess, but I think apidiff is complaining that these changes remove the ability to meaningfully compare those types with == since ManagedControlPlaneSubnet only contained two strings before but now includes a slice. I imagine it's not complaining about AzureCluster because those types were already incomparable before these changes.
@mtougeron we don't have a lot of E2E coverage around BYO VNET for AzureManagedCluster scenarios, are you confident in your own manual tests that this is feature complete and ready to use? code lgtm
I'll run through another round of manual testing today to be doubly sure. Is there any sort of logging I can provide that would make you feel confident in the testing as well?
@jackfrancis I tested this again for BYO VNET and it doesn't make any changes (as expected/desired).
However, as I look closer, there isn't any validation done on the AzureManagedControlPlane.spec.virtualNetwork (and sub fields like subnet) in the exp/api/v1beta1/azuremanagedcontrolplane_webhook.go code. Would you like me to add that new logic now or should I wait for it to be refactored to be more like the api/v1beta1/azurecluster_validation.go code?
As this relates to this PR, it just ignores the new serviceEndpoints field with BYO VNET and doesn't report anything to the user. While this is not the expected experience, it is similar to what happens when a user sets other fields that are not used with BYO VNET.
/retest
/retest
Apparently I created a regression with my last round of fixes since the pull-cluster-api-provider-azure-e2e test keeps failing. I'll figure out what's different in my manual testing vs the automated test suites and get that fixed.
/test pull-cluster-api-provider-azure-e2e
Ah, finally tracked down the failing log message.
E0930 16:32:16.616238 1 controller.go:326] "msg"="Reconciler error" "error"="admission webhook \"validation.azurecluster.infrastructure.cluster.x-k8s.io\" denied the request: AzureCluster.infrastructure.cluster.x-k8s.io \"capz-e2e-i7wx2l-public-custom-vnet\" is invalid: [spec.networkSpec.subnets[0].CIDRBlocks: Invalid value: []string{\"10.0.0.0/24\"}: field is immutable, spec.networkSpec.subnets[1].CIDRBlocks: Invalid value: []string{\"10.0.1.0/24\"}: field is immutable]" "azureCluster"={"name":"capz-e2e-i7wx2l-public-custom-vnet","namespace":"capz-e2e-i7wx2l"} "controller"="azurecluster" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureCluster" "name"="capz-e2e-i7wx2l-public-custom-vnet" "namespace"="capz-e2e-i7wx2l" "reconcileID"="2afbcc3e-fa11-444f-9bf3-019cc147134f"
Looks like the spec gets updated with the cidr blocks from the custom vnet but my checks for changes to those fields are preventing that. Previously the validation only did the checks if it was a managed vnet which is why it was passing previously. I'll update and push the fix
/test pull-cluster-api-provider-azure-e2e
Not sure what's up with prow but I'm going to try again.
/test pull-cluster-api-provider-azure-e2e
Added the SSA flags for the new field
Prow flake
/test pull-cluster-api-provider-azure-test
@mtougeron thanks for your patience with reviews here. This is a big and complex change, but FWIW I think we're really close. Just have a few more questions from my side.
@mtougeron thanks for your patience with reviews here. This is a big and complex change, but FWIW I think we're really close. Just have a few more questions from my side.
No problem at all. I've learned a heck of a lot in this one. It's been a ton of fun. :)
@mtougeron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-cluster-api-provider-azure-apidiff | 369c7e47bf7b80b0c2e7c7e77b2bdde32f7b11b1 | link | false | /test pull-cluster-api-provider-azure-apidiff |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: jackfrancis
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [jackfrancis]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment