pipeline icon indicating copy to clipboard operation
pipeline copied to clipboard

TaskRun reports as successful when the pod was evicted

Open drewbailey opened this issue 2 years ago • 1 comments

Expected Behavior

When a pod/container is evicted, the taskrun should fail, and include a reason/message associated to the eviction

Actual Behavior

the task run reports successful or shows an exit code / message from a container (137 reason Failed, not evicted)

Steps to Reproduce the Problem

  1. Run a pipeline with a emptdir workspace that has a size limit
  - emptyDir:
      sizeLimit: 10Gi
    name: workspace-user-repo
  1. Task executes code that exceeds the limit

Sometimes the taskrun will fail correctly stating that there was an eviction,

          message: 'Usage of EmptyDir volume "ws-hdl48" exceeds the limit "10Gi". '
          reason: Failed
          status: "False"

Other times it does not In these cases the container itself contains the eviction error. This seems to maybe be a race between the containers in the pod finishing and the eviction taking place?

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:24Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.16-eks-ffeb93d", GitCommit:"52e500d139bdef42fbc4540c357f0565c7867a81", GitTreeState:"clean", BuildDate:"2022-11-29T18:41:42Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.29.1
Pipeline version: v0.40.2

Task run status

status:
  completionTime: "2023-02-10T17:19:38Z"
  conditions:
  - lastTransitionTime: "2023-02-10T17:19:38Z"
    message: All Steps have completed executing
    reason: Succeeded
    status: "True"
    type: Succeeded
  podName: c4382b4f-af66-478e-b603-0b9b5a2f9127-linux-amd64-user-code-pod
  sidecars:
  - container: sidecar-ssh-key-setup-sidecar
    name: ssh-key-setup-sidecar
    terminated:
      containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
      exitCode: 137
      finishedAt: "2023-02-10T17:20:08Z"
      reason: Error
      startedAt: "2023-02-10T17:18:10Z"
  startTime: "2023-02-10T17:18:02Z"
  steps:
  - container: step-place-tools
    imageID: 
    name: place-tools
    terminated:
      containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
      exitCode: 0
      finishedAt: "2023-02-10T17:18:17Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-linux-amd64-user-code-extract-workspace-user-repo
    imageID: 
    name: linux-amd64-user-code-extract-workspace-user-repo
    terminated:
      containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
      exitCode: 0
      finishedAt: "2023-02-10T17:18:17Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-linux-amd64-user-code-extract-workspace-root-cache
    name: linux-amd64-user-code-extract-workspace-root-cache
    terminated:
      containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
      exitCode: 0
      finishedAt: "2023-02-10T17:18:45Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-user-code
    name: user-code
    terminated:
      containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
      exitCode: 0
      finishedAt: "2023-02-10T17:18:49Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:46Z"
  - container: step-linux-amd64-user-code-persist-workspace-root-cache
    name: linux-amd64-user-code-persist-workspace-root-cache
    terminated:
      containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
      exitCode: 0
      finishedAt: "2023-02-10T17:19:35Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:49Z"
  - container: step-linux-amd64-user-code-persist-workspace-user-repo
    name: linux-amd64-user-code-persist-workspace-user-repo
    terminated:
      containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
      exitCode: 0
      finishedAt: "2023-02-10T17:19:35Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:35Z"
  - container: step-linux-amd64-user-code-extract-workspace-private
    name: linux-amd64-user-code-extract-workspace-private
    terminated:
      containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
      exitCode: 0
      finishedAt: "2023-02-10T17:19:36Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:36Z"
  - container: step-post-user-code
    name: post-user-code
    terminated:
      containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
      exitCode: 0
      finishedAt: "2023-02-10T17:19:37Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:36Z"
  - container: step-linux-amd64-user-code-persist-workspace-private
    name: linux-amd64-user-code-persist-workspace-private
    terminated:
      containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
      exitCode: 0
      finishedAt: "2023-02-10T17:19:37Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:37Z"
  - container: step-finally-exit
    name: finally-exit
    terminated:
      containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
      exitCode: 0
      finishedAt: "2023-02-10T17:19:38Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:38Z"

Pod status & container statuses

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:06Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:17Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:17Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:02Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
    lastState: {}
    name: sidecar-ssh-key-setup-sidecar
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
        exitCode: 137
        finishedAt: "2023-02-10T17:20:08Z"
        reason: Error
        startedAt: "2023-02-10T17:18:10Z"
  - containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
    lastState: {}
    name: step-finally-exit
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
        exitCode: 0
        finishedAt: "2023-02-10T17:19:38Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:38.142Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:10Z"
  - containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-private
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
        exitCode: 0
        finishedAt: "2023-02-10T17:19:36Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.131Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-root-cache
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
        exitCode: 0
        finishedAt: "2023-02-10T17:18:45Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.687Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-user-repo
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
        exitCode: 0
        finishedAt: "2023-02-10T17:18:17Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.346Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-private
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
        exitCode: 0
        finishedAt: "2023-02-10T17:19:37Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:37.820Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-root-cache
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
        exitCode: 0
        finishedAt: "2023-02-10T17:19:35Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:49.447Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  - containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-user-repo
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
        exitCode: 0
        finishedAt: "2023-02-10T17:19:35Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:35.792Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  - containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
    lastState: {}
    name: step-place-tools
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
        exitCode: 0
        finishedAt: "2023-02-10T17:18:17Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.015Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
    lastState: {}
    name: step-post-user-code
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
        exitCode: 0
        finishedAt: "2023-02-10T17:19:37Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.478Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
    lastState: {}
    name: step-user-code
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
        exitCode: 0
        finishedAt: "2023-02-10T17:18:49Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:46.095Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  hostIP: 172.18.56.28
  initContainerStatuses:
  - containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
    lastState: {}
    name: prepare
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
        exitCode: 0
        finishedAt: "2023-02-10T17:18:04Z"
        reason: Completed
        startedAt: "2023-02-10T17:18:04Z"
  - containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
    lastState: {}
    name: place-scripts
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
        exitCode: 0
        finishedAt: "2023-02-10T17:18:05Z"
        reason: Completed
        startedAt: "2023-02-10T17:18:05Z"
  message: 'Usage of EmptyDir volume "ws-49mb5" exceeds the limit "10Gi". '
  phase: Failed
  qosClass: Burstable
  reason: Evicted
  startTime: "2023-02-10T17:18:02Z"

drewbailey avatar Feb 10 '23 17:02 drewbailey

It seems like #5646 might handle this error case. Before DidTaskRunFail would check all the ContainerStatuses even if the pod.Status.Phase == corev1.PodFailed. Looking at the container statuses here it seems like #5646 would now cover this case where the eviction happens after the containers exit.

drewbailey avatar Feb 10 '23 18:02 drewbailey

@drewbailey #5646 was merged and included in release v0.45.x - would you be able to verify if this issue could be closed then? Alternatively, would you be interested in designing a test for this case?

afrittoli avatar Mar 03 '23 13:03 afrittoli

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jun 01 '23 14:06 tekton-robot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jul 01 '23 14:07 tekton-robot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot avatar Jul 31 '23 15:07 tekton-robot

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Jul 31 '23 15:07 tekton-robot