cluster-api-provider-vsphere ✨ Add validating and mutating webhook for supervisor mode

trafficstars

What this PR does / why we need it:

Add validating and mutating webhook for supervisor mode:

- Refactor supervisor config folder and generate webhook config and manifest.
- Enable update validating for vm-operator VSphereMachine to block in-place update of following fields.
	- ImageName
	- ClassName
	- StorageClass
	- MinHardwareVersion

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #2595

Jan 23 '24 09:01 zhanggbj

Codecov Report

Attention: Patch coverage is 72.72727% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 64.07%. Comparing base (b89c791) to head (4ec4228). Report is 2 commits behind head on main.

:exclamation: Current head 4ec4228 differs from pull request most recent head e910494. Consider uploading reports for the commit e910494 to get more accurate results

Files	Patch %	Lines
internal/webhooks/vmoperator/vspheremachine.go	72.09%	10 Missing and 2 partials :warning:
internal/test/helpers/envtest.go	25.00%	2 Missing and 1 partial :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2651      +/-   ##
==========================================
- Coverage   65.48%   64.07%   -1.42%     
==========================================
  Files         161      161              
  Lines        7525     9391    +1866     
==========================================
+ Hits         4928     6017    +1089     
- Misses       2134     2914     +780     
+ Partials      463      460       -3

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 23 '24 09:01 codecov[bot]

It's ready for a review. And I'll try to add some tests if possible to make codecov happy.

Jan 24 '24 05:01 zhanggbj

It's ready for a review. And I'll try to add some tests if possible to make codecov happy.

Thanks, please ignore codecov, its not failing because of your PR, its the usual codecov flakyness when uploading the report 😭

Jan 24 '24 12:01 chrischdi

All comments addressed, it's ready for another review :-)

Jan 25 '24 08:01 zhanggbj

/test pull-cluster-api-provider-vsphere-test-main

Jan 25 '24 08:01 zhanggbj

Once we have the manifests working. Let's please do one manual test with tilt/kind or something (with a real Kubernetes cluster not just unit tests) to verify the webhooks are really active. It's very easy to make a mistake somewhere (e.g. incorrect / missing webhook manifests, incorrect resource match in the webhook config, ...)

Jan 25 '24 17:01 sbueringer

Once we have the manifests working. Let's please do one manual test with tilt/kind or something (with a real Kubernetes cluster not just unit tests) to verify the webhooks are really active. It's very easy to make a mistake somewhere (e.g. incorrect / missing webhook manifests, incorrect resource match in the webhook config, ...)

Make sense to me, will run tilt to verify it and update result later.

Jan 29 '24 07:01 zhanggbj

/test pull-cluster-api-provider-vsphere-test-integration-main

Feb 04 '24 08:02 zhanggbj

cc @zhanggbj (just in case you forgot about this one :))

Apr 09 '24 12:04 sbueringer

I'm testing locally with tilt for supervisor mode, but I think it's ready for an initial review while waiting for the test result.

Apr 24 '24 04:04 zhanggbj

Deployed a supervisor mode env with tilt and the webhook configs and manifests works fine. Please let me know if any others need to validate :-)

Checked webhook configs of capv-supervisor-mutating-webhook-configuration and capv-supervisor-validating-webhook-configuration.
Updating imageName of vspheremachine will be blocked as expected.

# vspheremachines.vmware.infrastructure.cluster.x-k8s.io "cluster2-67cn6" was not valid:
# * spec.imageName: Invalid value: v1beta1.VSphereMachineSpec{ProviderID:(*string)(0x40005a4e40), FailureDomain:(*string)(nil), ImageName:"ubuntu-2204-kube-v1.29.0", ClassName:"vcsim-default-vm-class", StorageClass:"vcsim-default-storage-class", Volumes:[]v1beta1.VSphereMachineVolume(nil), PowerOffMode:"hard", MinHardwareVersion:""}: updating imageName is not allowed

CC @sbueringer @chrischdi

May 08 '24 09:05 zhanggbj

/test pull-cluster-api-provider-vsphere-e2e-supervisor-main /test pull-cluster-api-provider-vsphere-e2e-govmomi-main

May 15 '24 11:05 sbueringer

We also need to change the path here:

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/5e47949a28d0da488b0c2ba37e8c0a175cc2ceb6/internal/test/helpers/webhook.go#L86

We may also need the similar thing for the supervisor then?!

Perhaps we need to add them here controllers/vmware/test?

May 15 '24 15:05 zhanggbj

@zhanggbj: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command pull-cluster-api-provider-vsphere-test-integration-main 4ec4228 link true /test pull-cluster-api-provider-vsphere-test-integration-main Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Interesting that Prow shows us a job run from February :) https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-vsphere/2651/pull-cluster-api-provider-vsphere-test-integration-main/1759866415801700352

(we deleted the job since then, so let's ignore)

May 16 '24 09:05 sbueringer

/lgtm

May 16 '24 09:05 sbueringer

LGTM label has been added.

Git tree hash: 0cbe2e1a85f3451505df183f9a362f7067af0325

May 16 '24 09:05 k8s-ci-robot

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main /test pull-cluster-api-provider-vsphere-e2e-supervisor-main

May 16 '24 09:05 sbueringer

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main

May 16 '24 10:05 sbueringer

This is nice :-)

/approve

May 16 '24 10:05 chrischdi

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [chrischdi]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

May 16 '24 10:05 k8s-ci-robot

/hold

for some manual testing

May 16 '24 11:05 sbueringer

The govmomi test now failed 2 times in a row. I'm starting to think we broke something

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main

May 16 '24 11:05 sbueringer

@zhanggbj: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-provider-vsphere-test-integration-main	4ec4228ae5dd95aef2ec935889983c74f887c470	link	true	`/test pull-cluster-api-provider-vsphere-test-integration-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

May 16 '24 11:05 k8s-ci-robot

Okay 3 times in a row seems too often

May 16 '24 15:05 sbueringer

Checked the test artifacts, the failed cluster is quick-start-3x2kmb and it's blocked at WaitForOneKubeadmControlPlaneMachineToExist, which seems blocked due to no ready node. From the kubelet.log, the latest error is CNI not ready as below, I cannot think of any relationship with this PR change🧐

May 16 11:17:54.628315 quick-start-3x2kmb-7q8wn kubelet[1552]: E0516 11:17:54.628259    1552 kubelet.go:2900] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

May 17 '24 07:05 zhanggbj

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main

May 17 '24 07:05 sbueringer

Same, but the job is 100% green on main. Let's see if it fails again. We might have to figure out why exactly the job is failing

May 17 '24 07:05 sbueringer

The one at pull-cluster-api-provider-vsphere-e2e-govmomi-main was in fact a flake. We also have that on release branches.

xref triage

May 17 '24 07:05 chrischdi

Interesting... the flakes all failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:153

The one at pull-cluster-api-provider-vsphere-e2e-govmomi-main was in fact a flake. We also have that on release branches.

xref triage

May 17 '24 09:05 zhanggbj

Kk. Maybe just bad luck :)

May 17 '24 11:05 sbueringer

cluster-api-provider-vsphere cluster-api-provider-vsphere copied to clipboard

✨ Add validating and mutating webhook for supervisor mode

Codecov Report

cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard