pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[backend] Unable to create directory in Minio when using Artifacts: Permission denied

Open jmaunon opened this issue 2 years ago • 9 comments

Hi Developers

I have tried to create a simple pipeline using and transfering data using "built-in" artifacts approach without success. Difficult to say what is hapenning but I have found similar issues in other threads.

Please, if you know a manual patch, let us know. I see artifacts a core solution/approach.

cc: @juliusvonkohout , @chensun

I am aware that there are some issues related, but I do not see a final solution or alternative patch. See: #6530 , https://github.com/kubeflow/manifests/issues/2573, https://github.com/kubeflow/pipelines/issues/7629

Environment

  • How did you deploy Kubeflow Pipelines (KFP): https://github.com/kubeflow/manifests latest tag: v.1.8.0

  • KFP version: According to the Readme from manifest repo: KFP 2.0.3

  • KFP SDK version: 2.6.0

Steps to reproduce

I get a permission denied error when using Artifacts.

Snippet of code:

@dsl.component(base_image="kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0")
def download_data(test_path: Output[Dataset]):
    
    import torch
    
    from torchvision.transforms import ToTensor
    from torchvision.datasets import MNIST
    
    mnist_test  = MNIST(".", download=True, train=False, transform=ToTensor())
        
    with open(test_path.path, "wb") as f:
        torch.save(mnist_test,f)

@dsl.pipeline(
    name='mnist',
    description='Detect digits',
)
def run():
    step_1 = download_data()

client.create_run_from_pipeline_func(run)

Associated logs:

failed to execute component: unable to create directory "/minio/mlpipeline/v2/artifacts/mnist/43f760f9-b638-4129-87fe-602e24076beb/download-data" for output artifact "test_path": mkdir /minio: permission denied

Expected result

Work without issus

Materials and Reference


Impacted by this bug? Give it a 👍.

jmaunon avatar Jan 15 '24 09:01 jmaunon

Please use the final 1.8 image, not jupyter-pytorch-full:v1.8.0-rc.0 and join the biweekly KFP meeting to discuss this.

juliusvonkohout avatar Jan 15 '24 18:01 juliusvonkohout

You should also try to update from KFP 2.0.3 to 2.0.5 first.

juliusvonkohout avatar Jan 15 '24 18:01 juliusvonkohout

Thans for the reply @juliusvonkohout . I write here my findings:

  • I have already tested with jupyter-pytorch-full:v1.8.0-rc.0 and jupyter-pytorch-full:v1.8.0 and the problem persist.
  • I have not updated to 2.0.5, so I cannot confirm if the error is fixed or not (I do not think so).

For any readers, I did not understand the explanation of #6530 but:

  • If I use base_image=python:3.10 the pipelines executes without problem because the user of the docker seems to be root. See associated dockerfiles
  • If use base_image=kubeflownotebookswg/jupyter-pytorch-full:v1.8.0 the pipeline raises the permission denied problem and i I see in the associated dockerfile that the user of the docker image is not root. See asociated dockerfile

jmaunon avatar Jan 16 '24 09:01 jmaunon

@rimolive this might be something to track for 1.9

juliusvonkohout avatar Jan 17 '24 10:01 juliusvonkohout

/assign @juliusvonkohout

zijianjoy avatar Jan 25 '24 23:01 zijianjoy

We have an open PR for that #10538.

rimolive avatar Mar 06 '24 18:03 rimolive

/assign @gregsheremeta

rimolive avatar Mar 06 '24 18:03 rimolive

@rimolive: GitHub didn't allow me to assign the following users: gregsheremeta.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @gregsheremeta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Mar 06 '24 18:03 google-oss-prow[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 06 '24 07:05 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar May 28 '24 07:05 github-actions[bot]

/reopen This issue still persists

majuss avatar May 28 '24 08:05 majuss

@majuss: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen This issue still persists

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-prow[bot] avatar May 28 '24 08:05 google-oss-prow[bot]

/reopen

rimolive avatar May 28 '24 12:05 rimolive

@rimolive: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-prow[bot] avatar May 28 '24 12:05 google-oss-prow[bot]

This issue is actually because Kubeflow Pipelines requires that component containers run as root, the container you have chosen kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0 runs as non-root.

There is a PR to fix this issue by mounting emptyDir volumes at the /minio and other paths, but that will need to be reviewed:

  • https://github.com/kubeflow/pipelines/pull/10538

@chensun @HumairAK @Tomcli we definitely need to prioritize fixing this issue, because it's pretty bad to have a hard requirement on root container images.

thesuperzapper avatar May 29 '24 20:05 thesuperzapper

I also want to say that the lack of securityContext support is related to this, because if we had it, it would provide a possible workaround:

  • https://github.com/kubeflow/pipelines/issues/9783

That is, if users could set the Pod securityContext, they could set runAsUser: 0 to override the UID of images which don't run as root by default.

thesuperzapper avatar May 29 '24 20:05 thesuperzapper

We're running into this now. All our end user containers run as non-root to optimize security. This is a pretty universal expectation at any security sensitive company.

droctothorpe avatar May 30 '24 18:05 droctothorpe

For anyone else running into this, we found a short-term workaround using kyverno that's not contingent on this PR being merged. Huge shout out to @moorthy156 for implementing it lightning fast. Just update the mountPath to minio or gcs or whatever else you need it to be.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-volume-mount-pipelineroot
spec:
  background: true
  failurePolicy: Ignore
  rules:
  - match:
      any:
      - resources:
          kinds:
          - Pod
          namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: "kubeflow-profile"
          selector:
            matchExpressions:
            - key: pipelines.kubeflow.org/v2_component
              operator: In
              values:
              - "true"
    mutate:
      patchStrategicMerge:
        spec:
          volumes:
          - name: pipelineroot
          containers:
          - (name): main | wait
            volumeMounts:
            - mountPath: /s3
              name: pipelineroot
            env:
            - name: AWS_REGION
              value: us-east-1
    name: add-volume-mount-pipelineroot
    preconditions:
      all:
      - key: '{{ request.operation }}'
        operator: Equals
        value: CREATE

droctothorpe avatar May 30 '24 19:05 droctothorpe

Just wanted to update everyone that there is a new PR being worked on that will fix this issue:

  • https://github.com/kubeflow/pipelines/pull/10857

thesuperzapper avatar Jun 04 '24 01:06 thesuperzapper

@chensun @james-jwu @zijianjoy can we please cherry-pick https://github.com/kubeflow/pipelines/pull/10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?

This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.

thesuperzapper avatar Jul 17 '24 04:07 thesuperzapper

@chensun @james-jwu @zijianjoy can we please cherry-pick #10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?

This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.

We can also do a follow up Kubeflow 1.9.1, but one way or the other we need a new release of KFP.

juliusvonkohout avatar Jul 22 '24 11:07 juliusvonkohout

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 21 '24 07:09 github-actions[bot]

This should have been resolved by https://github.com/kubeflow/pipelines/pull/10857 in 2.3.0

/close

thesuperzapper avatar Sep 21 '24 08:09 thesuperzapper

@thesuperzapper: Closing this issue.

In response to this:

This should have been resolved by https://github.com/kubeflow/pipelines/pull/10857 in 2.3.0

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Sep 21 '24 08:09 google-oss-prow[bot]