argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Artifact input file permission problems

Open chr-b opened this issue 2 years ago • 3 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issues exists when I tested with :latest
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

The high level problem is as follows:

  • Workflow step A creates an output artifact
  • Workflow step B consumes the output artifact
  • When the input artifact (in B) is an entire directory instead of just a single file, the permissions of the input artifact directory are messed up. This prevents reading the input artifact, unless executed with the root user.

The problem can be reproduced with the two workflows below. Note: I have added securityContext only for the reproducible workflow. In my original workflows there is no securityContext. But the container images define a non-root user in the Dockerfile.

Additional context: Deployed Argo Workflows from the official Helm Chart version 0.19.0. Therefore using version 3.4.0 by default. Problem remains when setting images.tag to latest in the Helm values.yaml file. The artifact repository is GCP GCS. The workflows are executed with a service account that has the required permissions to access GCS.

Version

v3.4.0

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

The following workflow **works** as expected:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-ok-
spec:
  entrypoint: overall
  templates:
  - name: overall
    dag:
      tasks:
        - name: step-a
          template: step-a
        - name: step-b
          template: step-b
          depends: "step-a"
          arguments:
            artifacts:
            - name: result
              from: "{{tasks.step-a.outputs.artifacts.result}}"
  - name: step-a
    outputs:
      artifacts:
        - name: result
          path: /tmp/results/a.txt
    script:
      image: debian:bullseye-slim
      command: [bash]
      source: |
        mkdir /tmp/results
        echo "abc" > /tmp/results/a.txt
  - name: step-b
    inputs:
      artifacts:
      - name: result
        path: /tmp/results/a.txt
        #mode: 0644
        #recurseMode: true
    script:
      image: debian:bullseye-slim
      command: [bash]
      source: |
        set -e
        ls -l /tmp/
        ls -l /tmp/results/
        cat /tmp/results/a.txt
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000

The following workflow fails:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-fail-
spec:
  entrypoint: overall
  templates:
  - name: overall
    dag:
      tasks:
        - name: step-a
          template: step-a
        - name: step-b
          template: step-b
          depends: "step-a"
          arguments:
            artifacts:
            - name: result
              from: "{{tasks.step-a.outputs.artifacts.result}}"
  - name: step-a
    outputs:
      artifacts:
        - name: result
          path: /tmp/results
    script:
      image: debian:bullseye-slim
      command: [bash]
      source: |
        mkdir /tmp/results
        echo "abc" > /tmp/results/a.txt
  - name: step-b
    inputs:
      artifacts:
      - name: result
        path: /tmp/results
        #mode: 0644
        #recurseMode: true
    script:
      image: debian:bullseye-slim
      command: [bash]
      source: |
        set -e
        ls -l /tmp/
        ls -l /tmp/results/
        cat /tmp/results/a.txt
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000


### Logs from the workflow controller

time="2022-09-21T17:56:57.159Z" level=info msg="Processing workflow" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.168Z" level=info msg="Updated phase  -> Running" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.168Z" level=info msg="DAG node artifact-passing-fail-dxfjp initialized Running" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.168Z" level=info msg="All of node artifact-passing-fail-dxfjp.step-a dependencies [] completed" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.168Z" level=info msg="Pod node artifact-passing-fail-dxfjp-3288406837 initialized Pending" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.184Z" level=info msg="Created pod: artifact-passing-fail-dxfjp.step-a (artifact-passing-fail-dxfjp-step-a-3288406837)" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.184Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.184Z" level=info msg=reconcileAgentPod namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:56:57.197Z" level=info msg="Workflow update successful" namespace=argo-workflows phase=Running resourceVersion=195589168 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.184Z" level=info msg="Processing workflow" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.185Z" level=info msg="Task-result reconciliation" namespace=argo-workflows numObjs=1 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.185Z" level=info msg="task-result changed" namespace=argo-workflows nodeID=artifact-passing-fail-dxfjp-3288406837 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.186Z" level=info msg="node changed" namespace=argo-workflows new.message= new.phase=Succeeded new.progress=0/1 nodeID=artifact-passing-fail-dxfjp-3288406837 old.message= old.phase=Pending old.progress=0/1 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.186Z" level=info msg="All of node artifact-passing-fail-dxfjp.step-b dependencies [step-a] completed" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.186Z" level=info msg="Pod node artifact-passing-fail-dxfjp-3238073980 initialized Pending" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.202Z" level=info msg="Created pod: artifact-passing-fail-dxfjp.step-b (artifact-passing-fail-dxfjp-step-b-3238073980)" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.203Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.203Z" level=info msg=reconcileAgentPod namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:07.221Z" level=info msg="Workflow update successful" namespace=argo-workflows phase=Running resourceVersion=195589289 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.203Z" level=info msg="Processing workflow" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.204Z" level=info msg="Task-result reconciliation" namespace=argo-workflows numObjs=2 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.204Z" level=info msg="task-result changed" namespace=argo-workflows nodeID=artifact-passing-fail-dxfjp-3238073980 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.204Z" level=info msg="node changed" namespace=argo-workflows new.message="Error (exit code 2)" new.phase=Failed new.progress=0/1 nodeID=artifact-passing-fail-dxfjp-3238073980 old.message= old.phase=Pending old.progress=0/1 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.204Z" level=info msg="node unchanged" namespace=argo-workflows nodeID=artifact-passing-fail-dxfjp-3288406837 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="Outbound nodes of artifact-passing-fail-dxfjp set to [artifact-passing-fail-dxfjp-3238073980]" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="node artifact-passing-fail-dxfjp phase Running -> Failed" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="node artifact-passing-fail-dxfjp finished: 2022-09-21 17:57:17.20509817 +0000 UTC" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="Checking daemoned children of artifact-passing-fail-dxfjp" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="TaskSet Reconciliation" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg=reconcileAgentPod namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="Updated phase Running -> Failed" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="Marking workflow completed" namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.205Z" level=info msg="Checking daemoned children of " namespace=argo-workflows workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.210Z" level=info msg="cleaning up pod" action=deletePod key=argo-workflows/artifact-passing-fail-dxfjp-1340600742-agent/deletePod
time="2022-09-21T17:57:17.224Z" level=info msg="Workflow update successful" namespace=argo-workflows phase=Failed resourceVersion=195589401 workflow=artifact-passing-fail-dxfjp
time="2022-09-21T17:57:17.256Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo-workflows/artifact-passing-fail-dxfjp-step-a-3288406837/labelPodCompleted
time="2022-09-21T17:57:17.256Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo-workflows/artifact-passing-fail-dxfjp-step-b-3238073980/labelPodCompleted


### Logs from in your workflow's wait container

time="2022-09-21T17:57:12.196Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-09-21T17:57:12.196Z" level=info msg="No output parameters"
time="2022-09-21T17:57:12.196Z" level=info msg="No output artifacts"
time="2022-09-21T17:57:12.196Z" level=info msg="GCS Save path: /tmp/argo/outputs/logs/main.log, key: argo-workflows/artifact-passing-fail-dxfjp/artifact-passing-fail-dxfjp-step-b-3238073980/main.log"
time="2022-09-21T17:57:12.354Z" level=info msg="Save artifact" artifactName=main-logs duration=158.361385ms error="<nil>" key=argo-workflows/artifact-passing-fail-dxfjp/artifact-passing-fail-dxfjp-step-b-3238073980/main.log
time="2022-09-21T17:57:12.354Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-09-21T17:57:12.354Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-09-21T17:57:12.369Z" level=info msg="Create workflowtaskresults 201"
time="2022-09-21T17:57:12.370Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2022-09-21T17:57:12.370Z" level=info msg="Alloc=24976 TotalAlloc=31317 Sys=34770 NumGC=5 Goroutines=10"
time="2022-09-21T17:57:02.384Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/artifacts/result.tgz
time="2022-09-21T17:57:02.384Z" level=info msg="Successfully saved file: /tmp/argo/outputs/artifacts/result.tgz"
time="2022-09-21T17:57:02.384Z" level=info msg="GCS Save path: /tmp/argo/outputs/logs/main.log, key: argo-workflows/artifact-passing-fail-dxfjp/artifact-passing-fail-dxfjp-step-a-3288406837/main.log"
time="2022-09-21T17:57:02.674Z" level=info msg="Save artifact" artifactName=main-logs duration=290.043663ms error="<nil>" key=argo-workflows/artifact-passing-fail-dxfjp/artifact-passing-fail-dxfjp-step-a-3288406837/main.log
time="2022-09-21T17:57:02.674Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-09-21T17:57:02.674Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-09-21T17:57:02.693Z" level=info msg="Create workflowtaskresults 201"
time="2022-09-21T17:57:02.694Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2022-09-21T17:57:02.694Z" level=info msg="Deadline monitor stopped"
time="2022-09-21T17:57:02.694Z" level=info msg="Alloc=23224 TotalAlloc=48300 Sys=51410 NumGC=6 Goroutines=11"

chr-b avatar Sep 21 '22 18:09 chr-b

@chr-b Can you try on v3.3.9?

sarabala1979 avatar Sep 23 '22 23:09 sarabala1979

Hi @sarabala1979 ,

The workflow artifact-passing-fail- from my examples works when using the v3.3.9 container images.

chr-b avatar Sep 24 '22 07:09 chr-b

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Oct 15 '22 22:10 stale[bot]

This problem still exists with Argo v3.4.2. This breaks every workflows that uses input artifacts and a non-root user.

chr-b avatar Oct 25 '22 08:10 chr-b

Can confirm we have the same issue, this is really blocking us

sebltm avatar Nov 04 '22 16:11 sebltm

same issue with v3.4.3

k-ebu avatar Nov 11 '22 09:11 k-ebu

I also confirm the issue with v3.4.3, I cannot use any artifact input directory with a non-root user.

PacoDu avatar Nov 16 '22 15:11 PacoDu

This is impacting us as well, with v 3.4.3 I am not able to access artifact input directory as that is having permission as follows - drwx------ root root No user other than root is able to access these files.

aneja-arun1 avatar Nov 20 '22 19:11 aneja-arun1

@aneja-arun1 @PacoDu can you uncomment the below lines and try?

        mode: 0644
        recurseMode: true

sarabala1979 avatar Nov 28 '22 18:11 sarabala1979

Hi @sarabala1979 , Adding mode and recurseMode to the input file artifact does not resolve the problem.

chr-b avatar Nov 29 '22 09:11 chr-b

Hi all, Same issue with v3.4.4. I think the problem is related to #8292 That PR replaces the untar function in workflow/executor/executor.go by an go based implementation that creates the directories with 0o700 permissions (owner only permissions) instead of 0o755 (allow read/exec to group and others)

https://github.com/argoproj/argo-workflows/pull/8292/files#diff-791eed50c295312394166c66addd3b676cc50ba3126730304c1fab2d5cac7a23R836

I have made a test changing the directory permission to 0o755 and it's working now.

dcd000 avatar Dec 21 '22 12:12 dcd000

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

stale[bot] avatar Jan 21 '23 20:01 stale[bot]

Bump. This is still a blocker issure for us.

rab-skybrid avatar Jan 23 '23 15:01 rab-skybrid

We have the same problem, we are still waiting a fix to update our environment.

jrguerrero avatar Jan 24 '23 08:01 jrguerrero

Just ran into this as well when trying to read from /src as nonroot user

W
fatal: failed to read object 871b916c4754bccccdd30cad6eac717b9d896cc2: Permission denied
time="2023-01-24T15:03:08.155Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 128

iainlbc avatar Jan 24 '23 15:01 iainlbc

Same problem here.

RenePinnow avatar Feb 07 '23 08:02 RenePinnow

Same problem here , we did version bump from 3.3.8 to 3.4.4, it's blocking us, may need to go back to 3.3.8

sandeepk8s avatar Feb 09 '23 15:02 sandeepk8s

Had the same problem upgrading to 3.4.5.

For now I'm setting the securityContext in all WorkflowSpec fields to workaround the issue:

spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 1000

Hope it helps

graillus avatar Feb 09 '23 16:02 graillus

v3.4.4 did some testing, Using above example, i tried few (only) modes 700, 755, 644 and without mode (default). Finally i used mode: 700 for our wfs, it worked. Below are screenshots for each case. But the problem is we have lot of workflows in different namespaces. Not sure if there's an easy to default the mode of artifact path 😞

image

sandeepk8s avatar Feb 10 '23 01:02 sandeepk8s

Thanks @graillus and @dcd000 your comments were helpful in my tests. After trying all methods, below one worked for me. Got it worked by making two changes

  1. Controller configmap, executor section
executor: |
    image: org.com:port/argoproj/argoexec:v3.4.4
    securityContext:
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000
  1. Controller configmap, workflow specs
workflowDefaults: | 
    spec:
      securityContext:
        runAsUser: 1000
        runAsNonRoot: true

sandeepk8s avatar Feb 10 '23 18:02 sandeepk8s