fix: properly handle skip on assessments
What type of PR is this?
/kind bug
What this PR does / why we need it:
t.Skip with --fail-fast was resulting in the parent test being failed, this patch fixes that, however I'm not entirely sure about the sematic of --fail-fast, so I'd appreciate if someone could clarify that.
For sure, independently of --fail-fast a call to t.Skip* in an Assessment should only skip the current Assessment. But in case of calls to t.Fail or t.FailNow, what should be the behaviour in the two cases?
Which issue(s) this PR fixes:
Fixes https://github.com/kubernetes-sigs/e2e-framework/issues/472
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Fix bug failing tests being skipped when `--fail-fast` is set.
Additional documentation e.g., Usage docs, etc.:
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: phisco Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @phisco. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
- a call to Fail should not attempt to skip any remaining assessment or feature
- a call to FailNow should skip all the remaining assessment but not skip the next features
Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from: https://github.com/kubernetes-sigs/e2e-framework/blob/b69e15831920a55e0c58ece5bf765594be7ecfdc/examples/fail_fast/README.md#L3-L11
Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from:
I'd say so too @phisco. Though, I don't agree with everything written in the README. In particular:
This can aid in getting a better turn around time especially in cases where your assessments are inter-related and a failed assessment in step 1 can mean that running the rest of the assessment is guaranteed to fail. This can be done with the help of a
--fail-fastflag provided at the framework level.
To me, this is the job of t.FailNow()
Basically:
- as the test writer, I know when it makes sense for my test to keep going (use
t.Fail()) or stop early because there is no point going further (uset.FailNow()) - as the test executor, I know if I want my whole test suite to exit early at the first sign of failure (use
--fail-fast). Hence--fail-fastshould skip the remaining assessments regardless of Fail/FailNow
Hope that makes sense :smile:
@phisco @Fricounet Thanks for this issue. The intent of --fail-fast is properly characterized here. It was a way to abruptly stop all assessments running. This was an early feature of the framework and if you guys think it could work better, let's keep this discussion going.
cc @harshanarayana
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale /ok-to-test
/ok-to-test
@phisco @Fricounet @harshanarayana
This PR slipped, but I think it's an important one to get clarity on and merge eventually.
Features such as --fail-fast as a way to unblock the framework's users, not knowing the full implications.
Now that the framework as been enjoying wider adoption, I think we can revisit that feature to ensure it's working the best way possible.
So, here is what i am proposing (which aligns with what @Fricounet mentioned):
- A test writer should use exit methods (
t.Fail/t.FailNow/t.Fatal) for granular control of test steps - Call to exit methods should be applied to associated assessment/feature only, allowing remaining tests to continue
--fail-fast- short-circuits current and all remaining tests (features, and assessments) if an exit method is called--fail-fastalso short-circuits the lifecycle of the entire framework runtime if a structural error occurs (i.e. during initialization, setup step)
Hope this clears thing up. If not, let me know what I am missing or what you want to add.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@phisco are you still interested in continuing with this change?
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.