training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Permission denied when reading TrainJob function script when run as non-root user

Open astefanutti opened this issue 11 months ago • 2 comments

What happened?

Creating a TrainJob on cluster with pod security admission configured to run containers as non-root:

from kubeflow.training import TrainingClient, Trainer

def train_func():
    pass

job_name = TrainingClient().train(
    runtime_ref="torch-distributed",
    trainer=Trainer(func=train_func),
)

Fails with permissions denied to read the train function script.

What did you expect to happen?

Running TrainJob as root should not be required. and the TrainJob should succeed when run as non-root.

Environment

Kubernetes version:

$ kubectl version
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.27.11+ec42b99

Training Operator Python SDK version:

$ pip show kubeflow-training
Version: 2.0.0

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

astefanutti avatar Jan 07 '25 14:01 astefanutti

Thanks for creating this @astefanutti!

/remove-label lifecycle/needs-triage /area sdk

andreyvelich avatar Jan 07 '25 18:01 andreyvelich

/assign @astefanutti since PR is ready for review. cc @andreyvelich

varodrig avatar Feb 14 '25 04:02 varodrig

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 15 '25 05:05 github-actions[bot]

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Jun 04 '25 05:06 github-actions[bot]