HAMi icon indicating copy to clipboard operation
HAMi copied to clipboard

non-root users have no write permission on /tmp/vgpu/cudevshr.cache

Open rnyrnyrny opened this issue 2 years ago • 1 comments

When running a program with a non-root user it fails because it has no permission to write to /tmp/vgpu/cudevshr.cache in the container. d12aa8be-e41a-40c6-8831-704cd9e3d2a3

rnyrnyrny avatar Mar 11 '22 02:03 rnyrnyrny

So since 2bcb2c5 the permission of /tmp/vgpu/cudevshr.cache has been changed to 666, which solves the problem only when you run the pod as root.

If I run the pod as non-root user there are still errors. Here are steps to reproduce:

  1. create a pod with a non-root user
apiVersion: v1
kind: Pod
  name: <podName>
  namespace: default
spec:
  containers:
  - command:
    - /bin/bash
    - -c
    - while true; do sleep 10; done
    image: <your-image>
    name: <podName>
    resources:
      limits:
        nvidia.com/gpu: 1
        nvidia.com/gpumem: 2000
      requests:
        nvidia.com/gpu: 1
        nvidia.com/gpumem: 2000
    securityContext:
      runAsUser: 1024
      runAsGroup: 1024
  1. kubectl exec into the container and run nvidia-smi The command will fail because the user can't create files in /tmp/vgpu.

I think there're 2 ways to solve this:

  1. In nvcr.go#L128, chmod currentbundle to 777, because umask is 022 and MkdirAll() will get dirs with 755 permission bits.
  2. Change CUDA_DEVICE_MEMORY_SHARED_CACHE in nvcr.go#L192 to a position that non-root users can create, or even make it configurable.

rnyrnyrny avatar Mar 21 '22 14:03 rnyrnyrny

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Apr 17 '24 20:04 github-actions[bot]

This issue has not seen any activity since it was marked stale. Closing.

github-actions[bot] avatar May 01 '24 20:05 github-actions[bot]