Unable to load GCS artifact which is a not empty directory
What happened:
Workload which attempts to load input artifact from GCS with key that is a not empty directory fails like this:
Name: test-h5rm4
Namespace: default
ServiceAccount: default
Status: Error
Message: failed to load artifacts: timed out waiting for the condition
Created: Sun Apr 26 10:07:14 +0000 (34 seconds ago)
Started: Sun Apr 26 10:07:14 +0000 (34 seconds ago)
Finished: Sun Apr 26 10:07:48 +0000 (now)
Duration: 34 seconds
STEP TEMPLATE PODNAME DURATION MESSAGE
⚠ test-h5rm4 test test-h5rm4 33s failed to load artifacts: timed out waiting for the condition
What you expected to happen:
Directory artifact is recursively loaded under specified path and workload ends with a success.
How to reproduce it (as minimally and precisely as possible):
Workflow template:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: test-
spec:
entrypoint: test
templates:
- name: test
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
inputs:
artifacts:
- name: test-dir-artifact
path: /test_dir/
gcs:
bucket: test-argo-bucket
key: test_dir
serviceAccountKeySecret:
name: argo-gcs-credentials
key: serviceAccountKey
- Setup GCS bucket with
test_dirdirectory that has some content. - Run the workflow defined above.
Anything else we need to know?:
This does not fail when the directory that the workflow wants to load is empty.
Environment:
I'm using GKE.
- Argo version:
$ argo version
argo: v2.8.0-rc2+4126d22.dirty
BuildDate: 2020-04-23T22:28:06Z
GitCommit: 4126d22b6f49e347ae1a75dd3ad6f484bee30f11
GitTreeState: dirty
GitTag: v2.8.0-rc2
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
- Kubernetes version :
$ kubectl version -o yaml
clientVersion:
buildDate: "2020-04-16T11:56:40Z"
compiler: gc
gitCommit: 52c56ce7a8272c798dbc29846288d7cd9fbae032
gitTreeState: clean
gitVersion: v1.18.2
goVersion: go1.13.9
major: "1"
minor: "18"
platform: linux/amd64
serverVersion:
buildDate: "2020-02-21T18:01:40Z"
compiler: gc
gitCommit: 145f9e21a4515947d6fb10819e5a336aff1b6959
gitTreeState: clean
gitVersion: v1.14.10-gke.27
goVersion: go1.12.12b4
major: "1"
minor: 14+
platform: linux/amd64
Logs
argo get <workflowname>
Name: test-h5rm4
Namespace: default
ServiceAccount: default
Status: Error
Message: failed to load artifacts: timed out waiting for the condition
Created: Sun Apr 26 10:07:14 +0000 (9 minutes ago)
Started: Sun Apr 26 10:07:14 +0000 (9 minutes ago)
Finished: Sun Apr 26 10:07:48 +0000 (8 minutes ago)
Duration: 34 seconds
STEP TEMPLATE PODNAME DURATION MESSAGE
⚠ test-h5rm4 test test-h5rm4 33s failed to load artifacts: timed out waiting for the condition
kubectl logs <failedpodname> -c init
time="2020-04-26T10:07:15Z" level=info msg="Starting Workflow Executor" version=v2.7.5+ede163e.dirty
time="2020-04-26T10:07:15Z" level=info msg="Creating a docker executor"
time="2020-04-26T10:07:15Z" level=info msg="Executor (version: v2.7.5+ede163e.dirty, build_date: 2020-04-21T01:12:08Z) initialized (pod: default/test-h5rm4) with template:\n{\"name\":\"test\",\"arguments\":{},\"inputs\":{\"artifacts\":[{\"name\":\"test-dir-artifact\",\"path\":\"/test_dir/\",\"gcs\":{\"bucket\":\"test-argo-bucket\",\"serviceAccountKeySecret\":{\"name\":\"argo-gcs-credentials\",\"key\":\"serviceAccountKey\"},\"key\":\"test_dir\"}}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"hello world\"],\"resources\":{}}}"
time="2020-04-26T10:07:15Z" level=info msg="Start loading input artifacts..."
time="2020-04-26T10:07:15Z" level=info msg="Downloading artifact: test-dir-artifact"
time="2020-04-26T10:07:15Z" level=info msg="GCS Load path: /argo/inputs/artifacts/test-dir-artifact.tmp, key: test_dir"
time="2020-04-26T10:07:16Z" level=warning msg="Failed to download objects from GCS: download object: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: file exists"
time="2020-04-26T10:07:18Z" level=info msg="GCS Load path: /argo/inputs/artifacts/test-dir-artifact.tmp, key: test_dir"
time="2020-04-26T10:07:18Z" level=warning msg="Failed to download objects from GCS: download object: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: file exists"
time="2020-04-26T10:07:22Z" level=info msg="GCS Load path: /argo/inputs/artifacts/test-dir-artifact.tmp, key: test_dir"
time="2020-04-26T10:07:22Z" level=warning msg="Failed to download objects from GCS: download object: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: file exists"
time="2020-04-26T10:07:31Z" level=info msg="GCS Load path: /argo/inputs/artifacts/test-dir-artifact.tmp, key: test_dir"
time="2020-04-26T10:07:31Z" level=warning msg="Failed to download objects from GCS: download object: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: file exists"
time="2020-04-26T10:07:47Z" level=info msg="GCS Load path: /argo/inputs/artifacts/test-dir-artifact.tmp, key: test_dir"
time="2020-04-26T10:07:47Z" level=warning msg="Failed to download objects from GCS: download object: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: mkdir /argo/inputs/artifacts/test-dir-artifact.tmp/: file exists"
time="2020-04-26T10:07:47Z" level=error msg="executor error: timed out waiting for the condition"
time="2020-04-26T10:07:47Z" level=info msg="Alloc=5187 TotalAlloc=18517 Sys=70848 NumGC=6 Goroutines=6"
time="2020-04-26T10:07:47Z" level=fatal msg="timed out waiting for the condition"
kubectl logs <failedpodname> -c wait
Error from server (BadRequest): container "wait" in pod "test-h5rm4" is waiting to start: PodInitializing
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
@whynowy GCS: seems like your domain?
I wasn't able to reproduce it.
@puchake and @hashkanna , could you please share the detail about it? I guess it might be related to some kind of special object in the bucket.
My test workflow:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: input-artifact-gcs-dir-
spec:
entrypoint: input-artifact-gcs-example
templates:
- name: input-artifact-gcs-example
inputs:
artifacts:
- name: my-art
path: /my_artifact
gcs:
bucket: test-bucket-0011234
key: abc
serviceAccountKeySecret:
name: my-gcs-credentials
key: serviceAccountKey
container:
image: debian:latest
command: [sh, -c]
args: ["ls -lR /my_artifact;"]
Logs:
$ argo logs input-artifact-gcs-dir-cv756
input-artifact-gcs-dir-cv756: /my_artifact:
input-artifact-gcs-dir-cv756: total 4
input-artifact-gcs-dir-cv756: drwx------ 3 root root 4096 May 18 07:46 123
input-artifact-gcs-dir-cv756:
input-artifact-gcs-dir-cv756: /my_artifact/123:
input-artifact-gcs-dir-cv756: total 56
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 289 May 18 07:46 Chart.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 42246 May 18 07:46 schema.yaml
input-artifact-gcs-dir-cv756: drwx------ 3 root root 4096 May 18 07:46 templates
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 230 May 18 07:46 validate.py
input-artifact-gcs-dir-cv756:
input-artifact-gcs-dir-cv756: /my_artifact/123/templates:
input-artifact-gcs-dir-cv756: total 8
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 792 May 18 07:46 ingress.yaml
input-artifact-gcs-dir-cv756: drwx------ 3 root root 4096 May 18 07:46 proxy
input-artifact-gcs-dir-cv756:
input-artifact-gcs-dir-cv756: /my_artifact/123/templates/proxy:
input-artifact-gcs-dir-cv756: total 24
input-artifact-gcs-dir-cv756: drwx------ 2 root root 4096 May 18 07:46 autohttps
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 3814 May 18 07:46 deployment.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 1817 May 18 07:46 netpol.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 336 May 18 07:46 pdb.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 401 May 18 07:46 secret.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 2312 May 18 07:46 service.yaml
input-artifact-gcs-dir-cv756:
input-artifact-gcs-dir-cv756: /my_artifact/123/templates/proxy/autohttps:
input-artifact-gcs-dir-cv756: total 28
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 457 May 18 07:46 _README.txt
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 386 May 18 07:46 configmap-nginx.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 4891 May 18 07:46 deployment.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 954 May 18 07:46 ingress-internal.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 3642 May 18 07:46 rbac.yaml
input-artifact-gcs-dir-cv756: -rw-r--r-- 1 root root 706 May 18 07:46 service.yaml
I can confirm this bug. I looked into the code and I think I know what's going on here. The GCS artifact loader is trying to list files with the given key as prefix
https://github.com/argoproj/argo/blob/cb3536f9d1dd64258c1c3d737bb115bdab923e58/workflow/artifacts/gcs/gcs.go#L84
And try to create the intermediate folders if it needs to
https://github.com/argoproj/argo/blob/cb3536f9d1dd64258c1c3d737bb115bdab923e58/workflow/artifacts/gcs/gcs.go#L104
The problem is, if you are trying to pull a file from a bucket with common prefix shared between different files like this
- result
- result.metadata
And say we are pull the key result here. It seems like the loader will try to first create a local file result.tmp, and later on, the TrimPrefix making the code thinks result.metadata is a sub-folder file .metadata under result. So it ends up trying to create a folder result.tmp. But is already a file we created previously, so it failed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I ran into this issue today on v3.4.3.
init container's last words before death:
time="2022-11-23T22:36:24.808Z" level=fatal msg="artifact inputtest failed to load: mkdir /argo/inputs/artifacts/inputtest.
tmp/: mkdir /argo/inputs/artifacts/inputtest.tmp/: file exists"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
We believe this should be addressed in v3.4.5 via #10214 - Can someone please test and confirm yes/no before we close this issue?
If no, notes are helpful so maintainers can further debug and fix in the next patch release.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
I don't believe this was fixed on 3.4.14. I am currently using the following artifact config:
artifacts:
- name: output
path: /mnt/output
gcs:
key: output/
archive:
none: {}
The argo server gives the logs:
time="2024-01-16T21:51:38.713Z" level=info msg="Get artifact file" artifactName=output namespace=argo nodeId=output-artifact-7wjdm workflowName=output-artifact-7wjdm
time="2024-01-16T21:51:38.721Z" level=info msg="Check if directory" artifactName=output duration="125.373µs" error="IsDirectory currently unimplemented for GCS" key=output/
time="2024-01-16T21:51:38.721Z" level=info msg="Efficient artifact streaming is not supported for type *gcs.ArtifactDriver: see https://github.com/argoproj/argo-workflows/issues/8489"
time="2024-01-16T21:51:38.721Z" level=info msg="GCS Load path: /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77, key: output"
time="2024-01-16T21:51:38.967Z" level=warning msg="Failed to download objects from GCS: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77/-artifact-2qlsh/output-artifact-2qlsh-exit-success-2250722693/: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77: not a directory"
time="2024-01-16T21:51:38.967Z" level=warning msg="Non-transient error: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77/-artifact-2qlsh/output-artifact-2qlsh-exit-success-2250722693/: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77: not a directory"
time="2024-01-16T21:51:38.967Z" level=warning msg="Non-transient error: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77: not a directory"
time="2024-01-16T21:51:38.967Z" level=warning msg="Non-transient error: not a directory"
time="2024-01-16T21:51:38.967Z" level=info msg="Stream artifact" artifactName=output duration=245.728329ms error="mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77/-artifact-2qlsh/output-artifact-2qlsh-exit-success-2250722693/: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77: not a directory" key=output/
time="2024-01-16T21:51:38.967Z" level=error msg="Artifact Server returned internal error" error="mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77/-artifact-2qlsh/output-artifact-2qlsh-exit-success-2250722693/: mkdir /tmp/hhxq94jzgm7ktgdrcqbb6vxglbnlwp77: not a directory"
time="2024-01-16T21:51:38.967Z" level=info duration=259.218784ms method=GET path=/artifact-files/argo/workflows/output-artifact-7wjdm/output-artifact-7wjdm/outputs/output size=22 status=500
If I use just:
artifacts:
- name: output
path: /mnt/output
then the archive gets created as a tar.gz and I can download it from the UI, but I would really like to view the images etc in the output dir directly from the UI.
@caelan-io can you please reopen or direct me to open a new bug?
Sure, re-opened. Thanks for documenting the issue on 3.4.14
Are you able to open a PR to address this issue by chance?
We started running into this issue today. We found a solution for us, maybe its applicable to others.
Our situation
- The bucket were were trying to pull the artifact from had two files
my_file.csvandmy_filewithout an extension. - The artifact was set to pull the file without an extension down as the input artifact to a step in the workflow
- This is where our init container kept throwing the directory not empty error
Our solution
- Giving the
my_file(with no extension) an extension solved the problem.
My suspicion is there is a process somewhere that is assuming files without extensions coming from GCS are directories even if that is not true.