pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

Possible race condition writing `exitCode` file

Open aruiz14 opened this issue 3 years ago • 4 comments

Expected Behavior

Consecutive steps can consistently access previous steps' result by using $(steps.step-mystep.exitCode.path).

Actual Behavior

I've observed some cases in which a second step of the same TaskRun won't find this file, which is being read at the very beginning of this steps' script.

Steps to Reproduce the Problem

Unfortunately, this step is not always reproducible. The TaskRun spec for which I observed this failure is like:

  • A main step that executes a command for which I set onFailure: continue.
  • A check-result step that reads $(steps.step-main.exitCode.path) and will act depending on the exit code.

Additional Info

  • Kubernetes version: Output of kubectl version:
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.11", GitCommit:"27522a29febbcc4badac257763044d0d90c11abd", GitTreeState:"clean", BuildDate:"2021-09-15T19:16:25Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

v0.32.1

I'm aware that this version is +8 months old, but I thought I should still report it as I couldn't find any similar issue. Unfortunately, I'm not able to upgrade at the moment but I'll try a newer version whenever possible and add any new information I'm able to obtain.

aruiz14 avatar Oct 04 '22 10:10 aruiz14

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jan 02 '23 11:01 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Feb 01 '23 11:02 tekton-robot

@aruiz14 Thanks for this report! Is this issue still occurring or were you able to find a workaround? It would be very useful to know if this is still reproducible in more recent versions.

afrittoli avatar Feb 01 '23 11:02 afrittoli

Hi @afrittoli, the workaround I applied was to make the script in the check-result step to wait until the $(steps.step-main.exitCode.path) file exists, and also upgraded to the LTS 0.41.0 version. I'm afraid that removing the workaround could affect my workload, and trying to reproduce the problem artificially could also be hard since it does not always happen even under the same circumstances. I'll let you know if I find it out. Thanks!

aruiz14 avatar Feb 03 '23 09:02 aruiz14

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot avatar Mar 05 '23 09:03 tekton-robot

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Mar 05 '23 09:03 tekton-robot