[backend] Unable to create directory in Minio when using Artifacts: Permission denied
Hi Developers
I have tried to create a simple pipeline using and transfering data using "built-in" artifacts approach without success.
Difficult to say what is hapenning but I have found similar issues in other threads.
Please, if you know a manual patch, let us know. I see artifacts a core solution/approach.
cc: @juliusvonkohout , @chensun
I am aware that there are some issues related, but I do not see a final solution or alternative patch. See: #6530 , https://github.com/kubeflow/manifests/issues/2573, https://github.com/kubeflow/pipelines/issues/7629
Environment
-
How did you deploy Kubeflow Pipelines (KFP): https://github.com/kubeflow/manifests latest tag: v.1.8.0
-
KFP version: According to the
Readmefrom manifest repo: KFP 2.0.3 -
KFP SDK version: 2.6.0
Steps to reproduce
I get a permission denied error when using Artifacts.
Snippet of code:
@dsl.component(base_image="kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0")
def download_data(test_path: Output[Dataset]):
import torch
from torchvision.transforms import ToTensor
from torchvision.datasets import MNIST
mnist_test = MNIST(".", download=True, train=False, transform=ToTensor())
with open(test_path.path, "wb") as f:
torch.save(mnist_test,f)
@dsl.pipeline(
name='mnist',
description='Detect digits',
)
def run():
step_1 = download_data()
client.create_run_from_pipeline_func(run)
Associated logs:
failed to execute component: unable to create directory "/minio/mlpipeline/v2/artifacts/mnist/43f760f9-b638-4129-87fe-602e24076beb/download-data" for output artifact "test_path": mkdir /minio: permission denied
Expected result
Work without issus
Materials and Reference
Impacted by this bug? Give it a 👍.
Please use the final 1.8 image, not jupyter-pytorch-full:v1.8.0-rc.0 and join the biweekly KFP meeting to discuss this.
You should also try to update from KFP 2.0.3 to 2.0.5 first.
Thans for the reply @juliusvonkohout . I write here my findings:
- I have already tested with
jupyter-pytorch-full:v1.8.0-rc.0andjupyter-pytorch-full:v1.8.0and the problem persist. - I have not updated to
2.0.5, so I cannot confirm if the error is fixed or not (I do not think so).
For any readers, I did not understand the explanation of #6530 but:
- If I use
base_image=python:3.10the pipelines executes without problem because the user of thedockerseems to beroot. See associated dockerfiles - If use
base_image=kubeflownotebookswg/jupyter-pytorch-full:v1.8.0the pipeline raises thepermission denied problemand i I see in the associated dockerfile that the user of thedocker imageis not root. See asociated dockerfile
@rimolive this might be something to track for 1.9
/assign @juliusvonkohout
We have an open PR for that #10538.
/assign @gregsheremeta
@rimolive: GitHub didn't allow me to assign the following users: gregsheremeta.
Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @gregsheremeta
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen This issue still persists
@majuss: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen This issue still persists
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/reopen
@rimolive: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
This issue is actually because Kubeflow Pipelines requires that component containers run as root, the container you have chosen kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0 runs as non-root.
There is a PR to fix this issue by mounting emptyDir volumes at the /minio and other paths, but that will need to be reviewed:
- https://github.com/kubeflow/pipelines/pull/10538
@chensun @HumairAK @Tomcli we definitely need to prioritize fixing this issue, because it's pretty bad to have a hard requirement on root container images.
I also want to say that the lack of securityContext support is related to this, because if we had it, it would provide a possible workaround:
- https://github.com/kubeflow/pipelines/issues/9783
That is, if users could set the Pod securityContext, they could set runAsUser: 0 to override the UID of images which don't run as root by default.
We're running into this now. All our end user containers run as non-root to optimize security. This is a pretty universal expectation at any security sensitive company.
For anyone else running into this, we found a short-term workaround using kyverno that's not contingent on this PR being merged. Huge shout out to @moorthy156 for implementing it lightning fast. Just update the mountPath to minio or gcs or whatever else you need it to be.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-volume-mount-pipelineroot
spec:
background: true
failurePolicy: Ignore
rules:
- match:
any:
- resources:
kinds:
- Pod
namespaceSelector:
matchLabels:
app.kubernetes.io/part-of: "kubeflow-profile"
selector:
matchExpressions:
- key: pipelines.kubeflow.org/v2_component
operator: In
values:
- "true"
mutate:
patchStrategicMerge:
spec:
volumes:
- name: pipelineroot
containers:
- (name): main | wait
volumeMounts:
- mountPath: /s3
name: pipelineroot
env:
- name: AWS_REGION
value: us-east-1
name: add-volume-mount-pipelineroot
preconditions:
all:
- key: '{{ request.operation }}'
operator: Equals
value: CREATE
Just wanted to update everyone that there is a new PR being worked on that will fix this issue:
- https://github.com/kubeflow/pipelines/pull/10857
@chensun @james-jwu @zijianjoy can we please cherry-pick https://github.com/kubeflow/pipelines/pull/10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?
This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.
@chensun @james-jwu @zijianjoy can we please cherry-pick #10857 into the 2.2 branch, and cut a 2.2.1 release with this fix?
This is a very important issue, as it prevents non-root containers from working in pipeline steps, which stops many people adopting Kubeflow Pipelines.
We can also do a follow up Kubeflow 1.9.1, but one way or the other we need a new release of KFP.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This should have been resolved by https://github.com/kubeflow/pipelines/pull/10857 in 2.3.0
/close
@thesuperzapper: Closing this issue.
In response to this:
This should have been resolved by https://github.com/kubeflow/pipelines/pull/10857 in 2.3.0
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.