cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

[WIP] Remove kcp adoption test

Open CecileRobertMichon opened this issue 3 years ago • 3 comments

Signed-off-by: willie-yao [email protected]

What type of PR is this?

What this PR does / why we need it: Attempt to re-apply #2612 and understand why it broke upgrade tests.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • [ ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests

Release note:

NONE

CecileRobertMichon avatar Oct 15 '22 00:10 CecileRobertMichon

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

CecileRobertMichon avatar Oct 15 '22 00:10 CecileRobertMichon

@willie-yao jfyi I tried opening a PR to re-apply your changes from #2596 but it makes the upgrade test fail still, I have no idea why 🤷‍♀️

CecileRobertMichon avatar Oct 20 '22 23:10 CecileRobertMichon

I may have found something...

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

CecileRobertMichon avatar Oct 20 '22 23:10 CecileRobertMichon

@willie-yao jfyi I tried opening a PR to re-apply your changes from #2596 but it makes the upgrade test fail still, I have no idea why 🤷‍♀️

That's very interesting.. It doesn't seem like the two tests are related in any way but I'll take a closer look!

willie-yao avatar Oct 21 '22 00:10 willie-yao

I think Willie and I figured out what's going on here. The KCP adoption spec from CAPI modifies some global state setting the default timeout for Gomega's Eventually outside of an It block which is evaluated when the test is defined, not in its regular execution flow: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/kcp_adoption.go#L81-L82

We found out that the CAPI upgrade spec uses an interval definition (wait-machine-pool-upgrade) that doesn't exist in our e2e config, so Gomega's default 1s timeout/100ms interval take over: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/cluster_upgrade.go#L208

So when the KCP adoption test is defined, the default timeout gets set before the upgrade test runs. When the KCP adoption test is removed, the default is never changed and the test fails because the timeout isn't long enough.

The simplest fix for now might be to add a default/wait-machine-pool-upgrade interval to the e2e config. Willie tested that and it seemed to work.

nojnhuh avatar Nov 01 '22 18:11 nojnhuh

I think Willie and I figured out what's going on here. The KCP adoption spec from CAPI modifies some global state setting the default timeout for Gomega's Eventually outside of an It block which is evaluated when the test is defined, not in its regular execution flow: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/kcp_adoption.go#L81-L82

We found out that the CAPI upgrade spec uses an interval definition (wait-machine-pool-upgrade) that doesn't exist in our e2e config, so Gomega's default 1s timeout/100ms interval take over: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/cluster_upgrade.go#L208

So when the KCP adoption test is defined, the default timeout gets set before the upgrade test runs. When the KCP adoption test is removed, the default is never changed and the test fails because the timeout isn't long enough.

The simplest fix for now might be to add a default/wait-machine-pool-upgrade interval to the e2e config. Willie tested that and it seemed to work.

Nice find!

devigned avatar Nov 01 '22 18:11 devigned

@CecileRobertMichon The line to fix this is default/wait-machine-pool-upgrade: ["60m", "10s"]. The CAPI test reads the intervals from our azure-dev.yaml file. Would you be able to add this under the "intervals" section? I can test this on my end to make sure it still works with all the other files removed.

willie-yao avatar Nov 01 '22 22:11 willie-yao

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

CecileRobertMichon avatar Nov 02 '22 18:11 CecileRobertMichon

@willie-yao added, PTAL

CecileRobertMichon avatar Nov 02 '22 18:11 CecileRobertMichon

/lgtm

willie-yao avatar Nov 02 '22 19:11 willie-yao

@willie-yao: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 02 '22 19:11 k8s-ci-robot

/retest

willie-yao avatar Nov 02 '22 19:11 willie-yao

/lgtm

nojnhuh avatar Nov 02 '22 19:11 nojnhuh

/retest

CecileRobertMichon avatar Nov 03 '22 16:11 CecileRobertMichon

/approve /label tide/merge-method-squash

CecileRobertMichon avatar Nov 03 '22 17:11 CecileRobertMichon

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [CecileRobertMichon]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Nov 03 '22 17:11 k8s-ci-robot