cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

add e2e for csi migration upgrade scenarios

Open sonasingh46 opened this issue 3 years ago • 9 comments

Signed-off-by: Ashutosh Kumar [email protected]

What type of PR is this? e2e tests

/kind feature

What this PR does / why we need it: With Kubernetes 1.23, CSI migration for azure disk is enabled by default and is in beta in 1.23 ( GA in 1.24 ). This means that one can either opt to go with the default migration in 1.23 or not go with it by explicitly disabling it. Thus we can have following upgrade paths for 1.22 to 1.23. This PR validates the upgrade paths by adding e2e.

So an upgrade from, let us say from k8s v1.22 to v1.23 can mean any of the following combination:

S.No AzureDiskCSI CloudProviderAzure AzureDiskCSIMigration
1 In-tree In-tree Disabled
2 External In-tree Enabled
3 External External Enabled

For the last combination in the table mentioned ( i.e 3 ) external-cloud-volume-plugin: azure flag needs to be set on controller manager for old Kubelets to keep talking to in-tree CCM and upgrade fails without this is set.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #2242

Special notes for your reviewer: A next PR will be raised to document the flags to be used for different scenarios in case of upgrade from v1.22 to v.123 Adding reference of a similar PR in capa https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3267/files TODOs:

  • [x ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests
None

sonasingh46 avatar Jul 06 '22 09:07 sonasingh46

/retest

sonasingh46 avatar Jul 06 '22 11:07 sonasingh46

/test pull-cluster-api-provider-azure-e2e-optional

sonasingh46 avatar Jul 06 '22 12:07 sonasingh46

/test pull-cluster-api-provider-azure-e2e-optional

sonasingh46 avatar Jul 07 '22 14:07 sonasingh46

/retest

sonasingh46 avatar Jul 07 '22 16:07 sonasingh46

/test pull-cluster-api-provider-azure-e2e-optional

sonasingh46 avatar Jul 25 '22 10:07 sonasingh46

/test pull-cluster-api-provider-azure-e2e-optional

sonasingh46 avatar Aug 02 '22 08:08 sonasingh46

The test infra PR: https://github.com/kubernetes/test-infra/pull/26993

sonasingh46 avatar Aug 02 '22 08:08 sonasingh46

/test pull-cluster-api-provider-azure-e2e-csi-migration

jackfrancis avatar Aug 02 '22 09:08 jackfrancis

@CecileRobertMichon -- I have added [k8s-upgrade] label and skipping the test if k8s version upgrade is not from 1.22 --> 1.23.

Question: Will this test even run in testgrid as I can see that upgrade from and upgrade to is pivoted to a particular stable release of k8s. https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/test/e2e/config/azure-dev.yaml#L150

sonasingh46 avatar Aug 20 '22 22:08 sonasingh46

@sonasingh46 it will run in the periodic jobs (https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-azure#capi-periodic-upgrade-main-1-22-1-23)

CecileRobertMichon avatar Aug 22 '22 17:08 CecileRobertMichon

/test ls

CecileRobertMichon avatar Aug 23 '22 23:08 CecileRobertMichon

@CecileRobertMichon: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-coverage
  • /test pull-cluster-api-provider-azure-e2e-csi-migration
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-coverage
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test ls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 23 '22 23:08 k8s-ci-robot

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

CecileRobertMichon avatar Aug 23 '22 23:08 CecileRobertMichon

/test pull-cluster-api-provider-azure-e2e-csi-migration

CecileRobertMichon avatar Aug 23 '22 23:08 CecileRobertMichon

AKS has enabled AzureDiskCSIMigration and installed Azure Disk CSI driver by default from 1.21, shall we follow same way on capz?

andyzhangx avatar Aug 24 '22 06:08 andyzhangx

@andyzhangx -- Can you elaborate more? What do you mean by follow same way in capz? Are you saying that all the workload cluster creation e2e test should install external azure disk csi driver? Or Are you saying that we should test upgrade from 1.20 --> 1.21 also as azure disk csi driver is enabled by default in AKS

sonasingh46 avatar Aug 24 '22 09:08 sonasingh46

AzureDiskCSIMigration

@sonasingh46 that depends on whether AzureDiskCSIMigration would be enabled on capz, if it's enabled on 1.23 (using default value), then Azure Disk CSI driver should be installed on capz 1.23 by default, otherwise there would be failure if user upgrade from 1.22 to 1.23 with in-tree disk PV.

andyzhangx avatar Aug 24 '22 12:08 andyzhangx

Right. As far as capz is concerned, the external drivers are not installed by default and no flags are set by default on controller manager that expects external csi driver for non managed clusters.

But with 1.23 the migration flag is enabled by default on core k8s.

So I don't think we need the test for version prior to 1.22 --> 1.23

If we need to do something similar for AKS as it might have implicitly enabled the migration flag -- sure we can have a test for managed cluster ( i.e AKS ). But I do not what is that inflection version to which we should test. If I read it right -- you are saying for 1.21 k8s version in AKS -- the migration is made default-- so in this case 1.20 --> 1.21 upgrade for AKS make sense.

sonasingh46 avatar Aug 24 '22 12:08 sonasingh46

@sonasingh46 for capz, only 1.22 --> 1.23 upgrade scenario is needed. And actually you could run the in-tree azure disk e2e test in every capz version, those in-tree azure disk e2e tests should pass in any version no matter it has enabled AzureDiskCSIMigration flag is enabled or not, no matter it's 1.21, 1.22 or 1.23.

andyzhangx avatar Aug 24 '22 13:08 andyzhangx

@andyzhangx -- Yes that makes sense. I can do that in a next PR as looks like this is bit out of scope of the current PR. wdyt?

sonasingh46 avatar Aug 24 '22 13:08 sonasingh46

This PR intends to test for a special inflection version.

sonasingh46 avatar Aug 24 '22 13:08 sonasingh46

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

sonasingh46 avatar Aug 25 '22 06:08 sonasingh46

/test pull-cluster-api-provider-azure-e2e-csi-migration

sonasingh46 avatar Aug 25 '22 06:08 sonasingh46

@andyzhangx -- Yes that makes sense. I can do that in a next PR as looks like this is bit out of scope of the current PR. wdyt?

@sonasingh46 lgtm, thanks!

andyzhangx avatar Aug 25 '22 07:08 andyzhangx

https://github.com/kubernetes/test-infra/pull/27261

sonasingh46 avatar Aug 25 '22 20:08 sonasingh46

/test pull-cluster-api-provider-azure-e2e-csi-migration

sonasingh46 avatar Aug 25 '22 20:08 sonasingh46

/retest pull-cluster-api-provider-azure-e2e-csi-migration

sonasingh46 avatar Aug 25 '22 22:08 sonasingh46

@sonasingh46: The /retest command does not accept any targets. The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-coverage
  • /test pull-cluster-api-provider-azure-e2e-csi-migration
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-coverage
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/retest pull-cluster-api-provider-azure-e2e-csi-migration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 25 '22 22:08 k8s-ci-robot

/test pull-cluster-api-provider-azure-e2e-csi-migration

sonasingh46 avatar Aug 25 '22 22:08 sonasingh46

@sonasingh46 looks like the csi migration test is timing out, PTAL

CecileRobertMichon avatar Aug 26 '22 17:08 CecileRobertMichon