cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

Improve e2e testing

Open sbueringer opened this issue 9 months ago • 10 comments

Improve e2e tests

We compared CAPV govmomi & supervisor e2e tests with core CAPI e2e tests. As a result we want to make the following improvements.

Adding e2e tests:

  • [x] P0: Add v1.30 => v1.31 upgrade e2e test + check ci-latest (supervisor+govmomi) (@sbueringer)
    • PR: https://github.com/kubernetes/test-infra/pull/32610
    • PR: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/2999
    • Plan is to add more and more upgrades jobs for new Kubernetes versions until we have the same test coverage as core CAPI.
  • [x] P0: Add test "When testing ClusterClass changes [ClusterClass]" (supervisor) (@zhanggbj)
    • PR: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3011
  • [x] P1: Add test "Ensure OwnerReferences and Finalizers are resilient" (supervisor) (@fabriziopandini)
    • [x] CAPI PR https://github.com/kubernetes-sigs/cluster-api/pull/10693
    • [x] CAPV PR https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3059
  • [x] P1: Also use ValidateResourceVersionStable func (https://github.com/kubernetes-sigs/cluster-api/pull/10530) to ownerRef & finalizer tests (supervisor+govmomi) (@fabriziopandini) https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3033
  • [x] P1: Let's check if we have unit test coverage for the following tests for supervisor @chrischdi
    • capv-e2e.[It] Hardware version upgrade creates a cluster with VM hardware versions upgraded
      • already covered via https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/b935f0aca9e1fa6c8e3d8749f0bfd9a307f9cad7/pkg/services/vmoperator/vmopmachine_test.go#L152
    • capv-e2e.[It] Label nodes with ESXi host info creates a workload cluster whose nodes have the ESXi host info
      • PR: #3085
  • [ ] P1: Add test "When testing ClusterClass rollouts [ClusterClass]" (supervisor) (@zhanggbj)
    • PR: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3023
  • [ ] P1: Add clusterctl upgrade tests n-2, n-1 (supervisor) (@zhanggbj)
    • CAPV 1.9=>current, CAPI 1.6=>1.7, CAPV 1.10=>current, CAPI 1.7=>1.7 (same we already have on main for govmomi)
    • PR: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3024
  • [ ] P1: When upgrading a workload cluster using ClusterClass with RuntimeSDK [ClusterClass] (supervisor) @chrischdi
    • This requires implementing a minimal Runtime Extension (let's discuss before we start implementing it, and let's do the other more straightforward sub-tasks first)
    • CAPI PR: https://github.com/kubernetes-sigs/cluster-api/pull/10788
    • CAPV PR: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3069
  • [ ] P2: Add test "When testing the machinery for scale testing" (vcsim + supervisor)
  • [ ] P2: Add clusterctl upgrade tests n-3 (supervisor)
    • Same as above just go one version more back
  • [ ] P2: Add test "When using the autoscaler with Cluster API using ClusterClass"
    • This could be very very tricky, IIRC the autoscaler needs connectivity to the mgmt cluster and this might be very hard to implement in Prow
  • [ ] P?: Audit if we can run more tests with vcsim

Other improvements:

  • [x] Use a kind cluster for clusterctl upgrade tests CAPI#7613 (@fabriziopandini)
    • [x] CAPI PR https://github.com/kubernetes-sigs/cluster-api/pull/10639
    • [x] CAPV PR https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3050
  • [ ] Introduce Ginkgo labels (CAPI+CAPV): CAPI#9620
    • Let's discuss details on how to do this first

Reconsider if we need it later:

  • Add test: When using the autoscaler with Cluster API using ClusterClass (incl. scale from zero coverage?)

govmomi has the following additional jobs (intentionally):

  • accepted: it's okay if it's only tested with govmomi:
    • capv-e2e.[It] Cluster creation with [Ignition] bootstrap [PR-Blocking] Should create a workload cluster
  • accepted: not surfaced in CAPV supervisor CRDs:
    • capv-e2e.[It] Cluster creation with anti affined nodes should create a cluster with anti-affined nodes
    • capv-e2e.[It] Cluster creation with storage policy should create a cluster successfully
    • capv-e2e.[It] DHCPOverrides configuration test when Creating a cluster with DHCPOverrides configured Only configures the network with the provided nameservers
  • accepted: IPAM provider not supported in supervisor mode
    • capv-e2e.[It] ClusterClass Creation using Cluster API quick-start test and IPAM Provider [ClusterClass] Should create a workload cluster

sbueringer avatar May 13 '24 16:05 sbueringer

/area supervisor Given that most of the improvements are for supervisor mode

fabriziopandini avatar May 13 '24 16:05 fabriziopandini

I would like to start with

  • [x] P0: Add test "When testing ClusterClass changes [ClusterClass]" (supervisor) (#3011 )

zhanggbj avatar May 14 '24 02:05 zhanggbj

Also would like to pick up a similar one :-)

  • [ ] P1: Add test "When testing ClusterClass rollouts [ClusterClass]" (supervisor)

zhanggbj avatar May 20 '24 10:05 zhanggbj

It seems this one is more about testing efforts. Just raised a PR to test.

  • [ ] P1: Add clusterctl upgrade tests n-2, n-1 (supervisor)

zhanggbj avatar May 24 '24 07:05 zhanggbj

Yeah good case is that most if it just works once added :)

A few require a bit more work

sbueringer avatar May 24 '24 08:05 sbueringer

Taking this one

  • [ ] P1: Let's check if we have unit test coverage for the following tests for supervisor

chrischdi avatar Jul 01 '24 12:07 chrischdi

I created a PR for

  • [ ] P1: Let's check if we have unit test coverage for the following tests for supervisor @chrischdi
    • capv-e2e.[It] Label nodes with ESXi host info creates a workload cluster whose nodes have the ESXi host info
  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/3085

Regarding the other subpoint:

  • capv-e2e.[It] Hardware version upgrade creates a cluster with VM hardware versions upgraded

There is already unit-test coverage here:

  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/b935f0aca9e1fa6c8e3d8749f0bfd9a307f9cad7/pkg/services/vmoperator/vmopmachine_test.go

We define the VSphereMachine here, the called function has a hardcoded value for vmx-17:

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/b935f0aca9e1fa6c8e3d8749f0bfd9a307f9cad7/pkg/services/vmoperator/vmopmachine_test.go#L118

Later we check the spec of the VirtualMachine to match the minHardwareVersion (again 17):

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/b935f0aca9e1fa6c8e3d8749f0bfd9a307f9cad7/pkg/services/vmoperator/vmopmachine_test.go#L152

Plus an immutability check at the end:

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/b935f0aca9e1fa6c8e3d8749f0bfd9a307f9cad7/pkg/services/vmoperator/vmopmachine_test.go#L304-L313

chrischdi avatar Jul 01 '24 17:07 chrischdi

@chrischdi @fabriziopandini if I see correctly we don't have the cluster autoscaler test on the list, let's add it as P2?

sbueringer avatar Jul 03 '24 05:07 sbueringer

@sbueringer ~~I could take this one once #3024 is merged. Thanks!~~

~~- [ ] Add clusterctl upgrade tests n-3 (supervisor)~~

zhanggbj avatar Jul 03 '24 06:07 zhanggbj

  • Add clusterctl upgrade tests n-3

@zhanggbj was just chatting with Christian and Fabrizio about it, we just realized that we didn't have supervisor e2e test in CAPV v1.8. So it would be a lot of effort to add v1.8 => main tests.

We would propose instead to wait until after the v1.11 release in ~ 6 weeks, and then we can just have v1.9 => main, v1.10 => main and v1.11 => main (so we get n-3 coverage just a bit later but with a lot less effort).

Probably we can just include it in the "Prepare main branch for development of the new release" task which is part of https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/3084. Usually we drop the oldest test there, but in that case we will simply keep it around.

sbueringer avatar Jul 03 '24 14:07 sbueringer

@sbueringer Makes sense to me. This will provide n-3 tests with much less effort. Thanks!

zhanggbj avatar Jul 05 '24 07:07 zhanggbj

@zhanggbj Sorry I was just working on prepare main branch and realized that "Add clusterctl upgrade tests n-3" will be covered by that, so I took over that task from you.

sbueringer avatar Aug 15 '24 06:08 sbueringer

@sbueringer Just go ahead, no worries :)

zhanggbj avatar Aug 15 '24 07:08 zhanggbj

@chrischdi Assigned Add test "When using the autoscaler with Cluster API using ClusterClass" to you + updated the PR description to add "part of ..."

sbueringer avatar Aug 26 '24 09:08 sbueringer

Moved remaining sub-tasks to separate issues:

  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/3181
  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/3182

/close

Great work everyone!

sbueringer avatar Sep 06 '24 11:09 sbueringer

@sbueringer: Closing this issue.

In response to this:

Moved remaining sub-tasks to separate issues:

  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/3181
  • https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/3182

/close

Great work everyone!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 06 '24 11:09 k8s-ci-robot