argo-workflows
argo-workflows copied to clipboard
EOF when many steps using same big input artifact
i have a 1.2GB artifact on s3
intermittently some of the tasks that use the same artifact as input gets below error:
Error (exit code 1): tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now
OR
Error (exit code 1): gzip: write: Out of memory tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now
logs also mention max duration:
time="2022-09-06T02:08:48.942Z" level=info msg="Processing workflow" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.945Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=3 workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): tar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-3794695805 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2620825050
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): tar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-217183810 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-1085538644 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-3703802753 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2444850475
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2697090281
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-1264146474 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-3571636647 are [bitbucket-workflow-20220906t12032810007frlh-2620825050]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-3332293106 are [bitbucket-workflow-20220906t12032810007frlh-2697090281]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-213337688 are [bitbucket-workflow-20220906t12032810007frlh-2444850475]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 finished: 2022-09-06 02:08:48.947745613 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 finished: 2022-09-06 02:08:48.948248846 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 finished: 2022-09-06 02:08:48.949245597 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
another run's 'init' container logs:
time="2022-09-06T02:52:23.726Z" level=info msg="Start loading input artifacts..."
time="2022-09-06T02:52:23.726Z" level=info msg="Downloading artifact: repo"
time="2022-09-06T02:52:23.726Z" level=info msg="S3 Load path: /argo/inputs/artifacts/repo.tmp, key: argo_wf_logs/2022/09/06/02/4
9/bitbucket-workflow-20220906t12490510007g2tv/bitbucket-workflow-20220906t12490510007g2tv-3635102771/repo.tgz"
time="2022-09-06T02:52:23.726Z" level=info msg="Creating minio client using AWS SDK credentials"
time="2022-09-06T02:52:25.220Z" level=info msg="Getting file from s3" bucket=myredactbucket en
dpoint=s3.amazonaws.com key=argo_wf_logs/2022/09/06/02/49/bitbucket-workflow-20220906t12490510007g2tv/bitbucket-workflow-2022090
6t12490510007g2tv-3635102771/repo.tgz path=/argo/inputs/artifacts/repo.tmp
time="2022-09-06T02:52:38.714Z" level=info msg="Detecting if /argo/inputs/artifacts/repo.tmp is a tarball"
time="2022-09-06T02:52:38.714Z" level=info msg="tar -xf /argo/inputs/artifacts/repo.tmp -C /argo/inputs/artifacts/repo.tmpdir"
time="2022-09-06T02:52:49.455Z" level=error msg="`tar -xf /argo/inputs/artifacts/repo.tmp -C /argo/inputs/artifacts/repo.tmpdir`
failed: gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverab
le: exiting now\n"
time="2022-09-06T02:52:49.601Z" level=error msg="executor error: gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar
: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now"
time="2022-09-06T02:52:49.601Z" level=info msg="Alloc=9938 TotalAlloc=18507 Sys=29138 NumGC=5 Goroutines=4"
time="2022-09-06T02:52:49.601Z" level=fatal msg="gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF
in archive\ntar: Error is not recoverable: exiting now"
its a fan out workflow:
version 3.3.9
Have you used PodSpecPatch to increase the resource request?
@hbrewster-splunk that worked but i think the docs should be updated to mention changing that for init container on steps that use a big input artifact
- name: mystep
podSpecPatch: '{"initContainers":[{"name":"init", "resources":{"requests":{"memory": "2Gi", "cpu": "300m" },"limits":{"memory": "3Gi", "cpu": "900m" }}}]}'
inputs:
artifacts:
- name: repo
path: /repo
container:
image: blabla
Feel free to submit a PR to improve the docs.
Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me
@2022H1030014G you can raise a PR to fix this even without being assigned
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me
@2022H1030014G Are you still working on it?
Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me
@2022H1030014G Are you still working on it?
If there's no PR, you can just start working on it.