pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Add option continueAndFail on step.onError

Open FaniD opened this issue 9 months ago • 19 comments

Feature request

Currently onError for steps can be set to continue or stopAndFail. In some cases, it could be beneficial to allow the taskRun to continue with the rest of the steps, but eventually fail with the exitCode of a specific step. Therefore, I'm introducing the value continueAndFail.

This is already discussed in the comments of issue #4485

Use case

I have a task that runs a test (step 1) and publishes a test report (step 2). I want to publish a test report even if step 1 fails and let the taskRun fail accordingly (step 1 exitCode). In that case, none of the current values of onError are enough, as continue would make the taskRun succeed and stopAndFail would prevent step 2 from running.

Release Notes

A new value has been added for step.onError:
- `continueAndFail`: declares a failure with a step error after having executed all following steps

FaniD avatar Mar 11 '25 17:03 FaniD

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: FaniD / name: Fani Dimou (16914d9e4d61fff2a6088f5e0dc68988a00fc6f7, 349a662d722cecc3288e840bcc4c88ee777bf6d7, 90b3e66caa7641c4fa231f8b297d43c79f5a643e, 894d0fc049d50350a14c9ce4e3dbe16d60821c57, 45d48df5245277565ab5e0068f1d278b74cbe3d4, c4992950a7e9c5ebf389b1392a25340004ce8a9c, c711fc854cc3cbad5ba749d97f2feba5cdf913fa)

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 18:03 tekton-robot

/test check-pr-has-kind-label

FaniD avatar Mar 11 '25 18:03 FaniD

@FaniD: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-pr-has-kind-label

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Mar 11 '25 18:03 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 19:03 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 19:03 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 19:03 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 19:03 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 88.0% 88.2% 0.1

tekton-robot avatar Mar 11 '25 19:03 tekton-robot

/kind feature

FaniD avatar Mar 13 '25 08:03 FaniD

SGTM, @FaniD can we also have an e2e tests validating that the Task is indeed failed.

vdemeester avatar Mar 13 '25 11:03 vdemeester

SGTM, @FaniD can we also have an e2e tests validating that the Task is indeed failed.

sure thing. I will work on this soonish

FaniD avatar Mar 13 '25 14:03 FaniD

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Apr 14 '25 20:04 tekton-robot

@vdemeester I finally got a chance to get back to this. Added testcases on the e2e tests that seemed more fit: taskrun termination reason and ignore step error test

FaniD avatar Apr 15 '25 09:04 FaniD

@FaniD 👋🏼 sorry it needs a rebase again 😓

vdemeester avatar Jun 11 '25 13:06 vdemeester

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar Jun 11 '25 13:06 tekton-robot

A very practical feature, looking forward to merging it soon. Thank you to the @FaniD for sharing.

l-qing avatar Jun 15 '25 00:06 l-qing

@FaniD this will need a rebase 🙏

afrittoli avatar Jun 16 '25 10:06 afrittoli

Thanks for taking a look! :) I will rebase and add the extra testcases soon

FaniD avatar Jun 23 '25 08:06 FaniD

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Jun 29 '25 18:06 tekton-robot

Thanks for this PR, it seems like a good feature to have. Could you add a test case for some more complex scenarios? I think the execution flow should be clear, I'm not sure about how the error reporting will work with multiple failures and different onError settings.

Example one:

* step1 has onError = continueAndFail, and it fails

* step2 fails
  Expected -> Task stops executing. How is the failure reported? step1 or step2 or a combination?

Example two:

* step1 has onError = continueAndFail, and it fails

* step2 has onError = continueAndFail, and it fails
  Expected -> Task continues executing. How is the failure reported? step1 or step2 or a combination?

Example three:

* step1 has onError = continueAndFail, and it fails

* step2 has onError = continue, and it fails
  Expected -> Task continues executing. How is the failure reported? step1 or step2 or a combination?

@afrittoli I gave this some thinking and to me the way to go would be reporting the failure of the first occurrence of continueAndFail. I added more testcases on test/ignore_step_error_test.go that reflect this. From my pov, it seems more simple and clear rather than combining failure messages from following steps.

FaniD avatar Jun 29 '25 18:06 FaniD

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Jun 29 '25 18:06 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Aug 11 '25 10:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Aug 11 '25 14:08 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.5% 87.7% 0.1

tekton-robot avatar Aug 11 '25 14:08 tekton-robot

/ok-to-test

waveywaves avatar Aug 14 '25 09:08 waveywaves

/retest

waveywaves avatar Aug 14 '25 09:08 waveywaves

/retest

waveywaves avatar Aug 19 '25 19:08 waveywaves

@FaniD I have restarted the e2e tests, in case of failures could you review them and rebase as well?

waveywaves avatar Aug 19 '25 19:08 waveywaves

@FaniD I have restarted the e2e tests, in case of failures could you review them and rebase as well?

Thank you @waveywaves . The tests show that there might be an issue with my solution. I'll take another look and refactor accordingly

FaniD avatar Sep 01 '25 21:09 FaniD