Continue e2e test after failure when checkpoint is enabled
Issue #, if available:
Description of changes:
For checkpoint tests, the first run of upgrade cluster will fail. If we return the error in the framework, this will signal the e2e test to end there. So if the checkpoint feature is enabled, we won't return the error in the framework. The error will still happen in the logs as usual, but the e2e test will continue, allowing us to accurately test the checkpoint feature.
Example output after the first upgrade failure with these changes:
2022-08-05T17:26:04.543Z V4 Task finished {"task_name": "collect-cluster-diagnostics", "duration": "2m41.519058791s"}
2022-08-05T17:26:04.543Z V4 ----------------------------------
2022-08-05T17:26:04.543Z V4 Saving checkpoint {"file": "eksa-test-5d81cdf-checkpoint.yaml"}
2022-08-05T17:26:04.544Z V4 Tasks completed {"duration": "4m48.184183815s"}
2022-08-05T17:26:04.544Z V3 Cleaning up long running container {"name": "eksa_1659720075911723793"}
Error: failed to upgrade cluster: waiting for external etcd for workload cluster to be ready: executing wait: error: timed out waiting for the condition on clusters/eksa-test-5d81cdf
cluster.go:646: Running shell command [ eksctl anywhere upgrade cluster -f eksa-test-5d81cdf/cluster.yaml -v 4 ]
2022-08-05T17:26:04.816Z V4 Logger init completed {"vlevel": 4}
2022-08-05T17:26:04.989Z V4 Reading bundles manifest
2022-08-05T17:26:05.064Z V2 Pulling docker image
2022-08-05T17:26:05.294Z V3 Initializing long running container
2022-08-05T17:26:05.483Z V4 Checkpoint feature enabled
2022-08-05T17:26:05.483Z V4 Reading checkpoint {"file": "eksa-test-5d81cdf/generated/eksa-test-5d81cdf-checkpoint.yaml"}
2022-08-05T17:26:05.483Z V4 Restoring task {"task_name": "setup-and-validate"}
2022-08-05T17:26:05.483Z V0 docker Provider setup is valid
2022-08-05T17:26:06.948Z V4 Restoring task {"task_name": "update-secrets"}
2022-08-05T17:26:06.948Z V4 Restoring task {"task_name": "ensure-etcd-capi-components-exist"}
...
Testing (if applicable):
Tested with docker checkpoint e2e test
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Codecov Report
Merging #2893 (ef2cd12) into main (dba9ff4) will increase coverage by
0.00%. The diff coverage isn/a.
@@ Coverage Diff @@
## main #2893 +/- ##
=======================================
Coverage 62.24% 62.25%
=======================================
Files 334 334
Lines 26849 26865 +16
=======================================
+ Hits 16713 16724 +11
- Misses 8854 8857 +3
- Partials 1282 1284 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/networking/cilium/reconciler/reconciler.go | 78.94% <0.00%> (-5.27%) |
:arrow_down: |
| pkg/workflows/delete.go | 59.73% <0.00%> (+0.27%) |
:arrow_up: |
| pkg/providers/snow/reconciler/reconciler.go | 84.21% <0.00%> (+84.21%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Are your unit tests succeeding locally?
I'm unable to test vsphere e2e tests locally but I have the same test locally using Docker provider & it passes
[APPROVALNOTIFIER] This PR is APPROVED
Approval requirements bypassed by manually added approval.
This pull-request has been approved by:
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/cherrypick release-0.11
@taneyland: new pull request created: #3095
In response to this:
/cherrypick release-0.11
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.