cluster-api-provider-azure
cluster-api-provider-azure copied to clipboard
[WIP] Remove kcp adoption test
Signed-off-by: willie-yao [email protected]
What type of PR is this?
What this PR does / why we need it: Attempt to re-apply #2612 and understand why it broke upgrade tests.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
- [ ] squashed commits
- [ ] includes documentation
- [ ] adds unit tests
Release note:
NONE
/test pull-cluster-api-provider-azure-e2e-workload-upgrade
@willie-yao jfyi I tried opening a PR to re-apply your changes from #2596 but it makes the upgrade test fail still, I have no idea why 🤷♀️
I may have found something...
/test pull-cluster-api-provider-azure-e2e-workload-upgrade
@willie-yao jfyi I tried opening a PR to re-apply your changes from #2596 but it makes the upgrade test fail still, I have no idea why 🤷♀️
That's very interesting.. It doesn't seem like the two tests are related in any way but I'll take a closer look!
I think Willie and I figured out what's going on here. The KCP adoption spec from CAPI modifies some global state setting the default timeout for Gomega's Eventually outside of an It block which is evaluated when the test is defined, not in its regular execution flow:
https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/kcp_adoption.go#L81-L82
We found out that the CAPI upgrade spec uses an interval definition (wait-machine-pool-upgrade) that doesn't exist in our e2e config, so Gomega's default 1s timeout/100ms interval take over:
https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/cluster_upgrade.go#L208
So when the KCP adoption test is defined, the default timeout gets set before the upgrade test runs. When the KCP adoption test is removed, the default is never changed and the test fails because the timeout isn't long enough.
The simplest fix for now might be to add a default/wait-machine-pool-upgrade interval to the e2e config. Willie tested that and it seemed to work.
I think Willie and I figured out what's going on here. The KCP adoption spec from CAPI modifies some global state setting the default timeout for Gomega's
Eventuallyoutside of anItblock which is evaluated when the test is defined, not in its regular execution flow: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/kcp_adoption.go#L81-L82We found out that the CAPI upgrade spec uses an interval definition (
wait-machine-pool-upgrade) that doesn't exist in our e2e config, so Gomega's default 1s timeout/100ms interval take over: https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.4/test/e2e/cluster_upgrade.go#L208So when the KCP adoption test is defined, the default timeout gets set before the upgrade test runs. When the KCP adoption test is removed, the default is never changed and the test fails because the timeout isn't long enough.
The simplest fix for now might be to add a
default/wait-machine-pool-upgradeinterval to the e2e config. Willie tested that and it seemed to work.
Nice find!
@CecileRobertMichon The line to fix this is default/wait-machine-pool-upgrade: ["60m", "10s"]. The CAPI test reads the intervals from our azure-dev.yaml file. Would you be able to add this under the "intervals" section? I can test this on my end to make sure it still works with all the other files removed.
/test pull-cluster-api-provider-azure-e2e-workload-upgrade
@willie-yao added, PTAL
/lgtm
@willie-yao: changing LGTM is restricted to collaborators
In response to this:
/lgtm
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/retest
/lgtm
/retest
/approve /label tide/merge-method-squash
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: CecileRobertMichon
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [CecileRobertMichon]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment