pipeline
pipeline copied to clipboard
TaskRun reports as successful when the pod was evicted
Expected Behavior
When a pod/container is evicted, the taskrun should fail, and include a reason/message associated to the eviction
Actual Behavior
the task run reports successful or shows an exit code / message from a container (137 reason Failed, not evicted)
Steps to Reproduce the Problem
- Run a pipeline with a emptdir workspace that has a size limit
- emptyDir:
sizeLimit: 10Gi
name: workspace-user-repo
- Task executes code that exceeds the limit
Sometimes the taskrun will fail correctly stating that there was an eviction,
message: 'Usage of EmptyDir volume "ws-hdl48" exceeds the limit "10Gi". '
reason: Failed
status: "False"
Other times it does not In these cases the container itself contains the eviction error. This seems to maybe be a race between the containers in the pod finishing and the eviction taking place?
Additional Info
-
Kubernetes version:
Output of
kubectl version
:
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:24Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.16-eks-ffeb93d", GitCommit:"52e500d139bdef42fbc4540c357f0565c7867a81", GitTreeState:"clean", BuildDate:"2022-11-29T18:41:42Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
-
Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
Client version: 0.29.1
Pipeline version: v0.40.2
Task run status
status:
completionTime: "2023-02-10T17:19:38Z"
conditions:
- lastTransitionTime: "2023-02-10T17:19:38Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
podName: c4382b4f-af66-478e-b603-0b9b5a2f9127-linux-amd64-user-code-pod
sidecars:
- container: sidecar-ssh-key-setup-sidecar
name: ssh-key-setup-sidecar
terminated:
containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
exitCode: 137
finishedAt: "2023-02-10T17:20:08Z"
reason: Error
startedAt: "2023-02-10T17:18:10Z"
startTime: "2023-02-10T17:18:02Z"
steps:
- container: step-place-tools
imageID:
name: place-tools
terminated:
containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
exitCode: 0
finishedAt: "2023-02-10T17:18:17Z"
reason: Completed
startedAt: "2023-02-10T17:18:17Z"
- container: step-linux-amd64-user-code-extract-workspace-user-repo
imageID:
name: linux-amd64-user-code-extract-workspace-user-repo
terminated:
containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
exitCode: 0
finishedAt: "2023-02-10T17:18:17Z"
reason: Completed
startedAt: "2023-02-10T17:18:17Z"
- container: step-linux-amd64-user-code-extract-workspace-root-cache
name: linux-amd64-user-code-extract-workspace-root-cache
terminated:
containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
exitCode: 0
finishedAt: "2023-02-10T17:18:45Z"
reason: Completed
startedAt: "2023-02-10T17:18:17Z"
- container: step-user-code
name: user-code
terminated:
containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
exitCode: 0
finishedAt: "2023-02-10T17:18:49Z"
reason: Completed
startedAt: "2023-02-10T17:18:46Z"
- container: step-linux-amd64-user-code-persist-workspace-root-cache
name: linux-amd64-user-code-persist-workspace-root-cache
terminated:
containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
exitCode: 0
finishedAt: "2023-02-10T17:19:35Z"
reason: Completed
startedAt: "2023-02-10T17:18:49Z"
- container: step-linux-amd64-user-code-persist-workspace-user-repo
name: linux-amd64-user-code-persist-workspace-user-repo
terminated:
containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
exitCode: 0
finishedAt: "2023-02-10T17:19:35Z"
reason: Completed
startedAt: "2023-02-10T17:19:35Z"
- container: step-linux-amd64-user-code-extract-workspace-private
name: linux-amd64-user-code-extract-workspace-private
terminated:
containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
exitCode: 0
finishedAt: "2023-02-10T17:19:36Z"
reason: Completed
startedAt: "2023-02-10T17:19:36Z"
- container: step-post-user-code
name: post-user-code
terminated:
containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
exitCode: 0
finishedAt: "2023-02-10T17:19:37Z"
reason: Completed
startedAt: "2023-02-10T17:19:36Z"
- container: step-linux-amd64-user-code-persist-workspace-private
name: linux-amd64-user-code-persist-workspace-private
terminated:
containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
exitCode: 0
finishedAt: "2023-02-10T17:19:37Z"
reason: Completed
startedAt: "2023-02-10T17:19:37Z"
- container: step-finally-exit
name: finally-exit
terminated:
containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
exitCode: 0
finishedAt: "2023-02-10T17:19:38Z"
reason: Completed
startedAt: "2023-02-10T17:19:38Z"
Pod status & container statuses
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-02-10T17:18:06Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-02-10T17:18:17Z"
reason: PodFailed
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-02-10T17:18:17Z"
reason: PodFailed
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-02-10T17:18:02Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
lastState: {}
name: sidecar-ssh-key-setup-sidecar
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
exitCode: 137
finishedAt: "2023-02-10T17:20:08Z"
reason: Error
startedAt: "2023-02-10T17:18:10Z"
- containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
lastState: {}
name: step-finally-exit
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
exitCode: 0
finishedAt: "2023-02-10T17:19:38Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:19:38.142Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:10Z"
- containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
lastState: {}
name: step-linux-amd64-user-code-extract-workspace-private
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
exitCode: 0
finishedAt: "2023-02-10T17:19:36Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.131Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:09Z"
- containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
lastState: {}
name: step-linux-amd64-user-code-extract-workspace-root-cache
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
exitCode: 0
finishedAt: "2023-02-10T17:18:45Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.687Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:07Z"
- containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
lastState: {}
name: step-linux-amd64-user-code-extract-workspace-user-repo
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
exitCode: 0
finishedAt: "2023-02-10T17:18:17Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.346Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:07Z"
- containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
lastState: {}
name: step-linux-amd64-user-code-persist-workspace-private
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
exitCode: 0
finishedAt: "2023-02-10T17:19:37Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:19:37.820Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:09Z"
- containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
lastState: {}
name: step-linux-amd64-user-code-persist-workspace-root-cache
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
exitCode: 0
finishedAt: "2023-02-10T17:19:35Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:18:49.447Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:08Z"
- containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
lastState: {}
name: step-linux-amd64-user-code-persist-workspace-user-repo
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
exitCode: 0
finishedAt: "2023-02-10T17:19:35Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:19:35.792Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:08Z"
- containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
lastState: {}
name: step-place-tools
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
exitCode: 0
finishedAt: "2023-02-10T17:18:17Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.015Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:07Z"
- containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
lastState: {}
name: step-post-user-code
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
exitCode: 0
finishedAt: "2023-02-10T17:19:37Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.478Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:09Z"
- containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
lastState: {}
name: step-user-code
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
exitCode: 0
finishedAt: "2023-02-10T17:18:49Z"
message: '[{"key":"StartedAt","value":"2023-02-10T17:18:46.095Z","type":3}]'
reason: Completed
startedAt: "2023-02-10T17:18:08Z"
hostIP: 172.18.56.28
initContainerStatuses:
- containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
lastState: {}
name: prepare
ready: true
restartCount: 0
state:
terminated:
containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
exitCode: 0
finishedAt: "2023-02-10T17:18:04Z"
reason: Completed
startedAt: "2023-02-10T17:18:04Z"
- containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
lastState: {}
name: place-scripts
ready: true
restartCount: 0
state:
terminated:
containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
exitCode: 0
finishedAt: "2023-02-10T17:18:05Z"
reason: Completed
startedAt: "2023-02-10T17:18:05Z"
message: 'Usage of EmptyDir volume "ws-49mb5" exceeds the limit "10Gi". '
phase: Failed
qosClass: Burstable
reason: Evicted
startTime: "2023-02-10T17:18:02Z"
It seems like #5646 might handle this error case. Before DidTaskRunFail
would check all the ContainerStatuses even if the pod.Status.Phase == corev1.PodFailed
. Looking at the container statuses here it seems like #5646 would now cover this case where the eviction happens after the containers exit.
@drewbailey #5646 was merged and included in release v0.45.x - would you be able to verify if this issue could be closed then? Alternatively, would you be interested in designing a test for this case?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
with a justification.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
with a justification. Mark the issue as fresh with/remove-lifecycle rotten
with a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozen
with a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.