HAMi
HAMi copied to clipboard
non-root users have no write permission on /tmp/vgpu/cudevshr.cache
When running a program with a non-root user it fails because it has no permission to write to /tmp/vgpu/cudevshr.cache
in the container.
So since 2bcb2c5 the permission of /tmp/vgpu/cudevshr.cache
has been changed to 666
, which solves the problem only when you run the pod as root.
If I run the pod as non-root user there are still errors. Here are steps to reproduce:
- create a pod with a non-root user
apiVersion: v1
kind: Pod
name: <podName>
namespace: default
spec:
containers:
- command:
- /bin/bash
- -c
- while true; do sleep 10; done
image: <your-image>
name: <podName>
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 2000
requests:
nvidia.com/gpu: 1
nvidia.com/gpumem: 2000
securityContext:
runAsUser: 1024
runAsGroup: 1024
-
kubectl exec
into the container and runnvidia-smi
The command will fail because the user can't create files in/tmp/vgpu
.
I think there're 2 ways to solve this:
- In nvcr.go#L128, chmod
currentbundle
to777
, because umask is022
andMkdirAll()
will get dirs with755
permission bits. - Change
CUDA_DEVICE_MEMORY_SHARED_CACHE
in nvcr.go#L192 to a position that non-root users can create, or even make it configurable.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has not seen any activity since it was marked stale. Closing.