cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
✨ Add validating and mutating webhook for supervisor mode
What this PR does / why we need it:
Add validating and mutating webhook for supervisor mode:
- Refactor supervisor config folder and generate webhook config and manifest.
- Enable update validating for vm-operator VSphereMachine to block in-place update of following fields.
- ImageName
- ClassName
- StorageClass
- MinHardwareVersion
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2595
Codecov Report
Attention: Patch coverage is 72.72727% with 15 lines in your changes are missing coverage. Please review.
Project coverage is 64.07%. Comparing base (
b89c791) to head (4ec4228). Report is 2 commits behind head on main.
:exclamation: Current head 4ec4228 differs from pull request most recent head e910494. Consider uploading reports for the commit e910494 to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| internal/webhooks/vmoperator/vspheremachine.go | 72.09% | 10 Missing and 2 partials :warning: |
| internal/test/helpers/envtest.go | 25.00% | 2 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2651 +/- ##
==========================================
- Coverage 65.48% 64.07% -1.42%
==========================================
Files 161 161
Lines 7525 9391 +1866
==========================================
+ Hits 4928 6017 +1089
- Misses 2134 2914 +780
+ Partials 463 460 -3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It's ready for a review. And I'll try to add some tests if possible to make codecov happy.
It's ready for a review. And I'll try to add some tests if possible to make codecov happy.
Thanks, please ignore codecov, its not failing because of your PR, its the usual codecov flakyness when uploading the report 😭
All comments addressed, it's ready for another review :-)
/test pull-cluster-api-provider-vsphere-test-main
Once we have the manifests working. Let's please do one manual test with tilt/kind or something (with a real Kubernetes cluster not just unit tests) to verify the webhooks are really active. It's very easy to make a mistake somewhere (e.g. incorrect / missing webhook manifests, incorrect resource match in the webhook config, ...)
Once we have the manifests working. Let's please do one manual test with tilt/kind or something (with a real Kubernetes cluster not just unit tests) to verify the webhooks are really active. It's very easy to make a mistake somewhere (e.g. incorrect / missing webhook manifests, incorrect resource match in the webhook config, ...)
Make sense to me, will run tilt to verify it and update result later.
/test pull-cluster-api-provider-vsphere-test-integration-main
cc @zhanggbj (just in case you forgot about this one :))
I'm testing locally with tilt for supervisor mode, but I think it's ready for an initial review while waiting for the test result.
Deployed a supervisor mode env with tilt and the webhook configs and manifests works fine. Please let me know if any others need to validate :-)
-
Checked webhook configs of
capv-supervisor-mutating-webhook-configurationandcapv-supervisor-validating-webhook-configuration. -
Updating imageName of vspheremachine will be blocked as expected.
# vspheremachines.vmware.infrastructure.cluster.x-k8s.io "cluster2-67cn6" was not valid:
# * spec.imageName: Invalid value: v1beta1.VSphereMachineSpec{ProviderID:(*string)(0x40005a4e40), FailureDomain:(*string)(nil), ImageName:"ubuntu-2204-kube-v1.29.0", ClassName:"vcsim-default-vm-class", StorageClass:"vcsim-default-storage-class", Volumes:[]v1beta1.VSphereMachineVolume(nil), PowerOffMode:"hard", MinHardwareVersion:""}: updating imageName is not allowed
CC @sbueringer @chrischdi
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main /test pull-cluster-api-provider-vsphere-e2e-govmomi-main
We also need to change the path here:
https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/5e47949a28d0da488b0c2ba37e8c0a175cc2ceb6/internal/test/helpers/webhook.go#L86
We may also need the similar thing for the supervisor then?!
Perhaps we need to add them here controllers/vmware/test?
@zhanggbj: The following test failed, say
/retestto rerun all failed tests or/retest-requiredto rerun all mandatory failed tests:Test name Commit Details Required Rerun command pull-cluster-api-provider-vsphere-test-integration-main 4ec4228 link true
/test pull-cluster-api-provider-vsphere-test-integration-mainFull PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Interesting that Prow shows us a job run from February :) https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-vsphere/2651/pull-cluster-api-provider-vsphere-test-integration-main/1759866415801700352
(we deleted the job since then, so let's ignore)
/lgtm
LGTM label has been added.
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main /test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
This is nice :-)
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: chrischdi
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [chrischdi]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/hold
for some manual testing
The govmomi test now failed 2 times in a row. I'm starting to think we broke something
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
@zhanggbj: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-cluster-api-provider-vsphere-test-integration-main | 4ec4228ae5dd95aef2ec935889983c74f887c470 | link | true | /test pull-cluster-api-provider-vsphere-test-integration-main |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Okay 3 times in a row seems too often
Checked the test artifacts, the failed cluster is quick-start-3x2kmb and it's blocked at WaitForOneKubeadmControlPlaneMachineToExist, which seems blocked due to no ready node.
From the kubelet.log, the latest error is CNI not ready as below, I cannot think of any relationship with this PR change🧐
May 16 11:17:54.628315 quick-start-3x2kmb-7q8wn kubelet[1552]: E0516 11:17:54.628259 1552 kubelet.go:2900] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
Same, but the job is 100% green on main. Let's see if it fails again. We might have to figure out why exactly the job is failing
The one at pull-cluster-api-provider-vsphere-e2e-govmomi-main was in fact a flake. We also have that on release branches.
Interesting... the flakes all failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:153
The one at
pull-cluster-api-provider-vsphere-e2e-govmomi-mainwas in fact a flake. We also have that on release branches.
Kk. Maybe just bad luck :)