nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

gpu allocate wrong

Open majorinche opened this issue 3 years ago • 1 comments

in container, we already has an env setting for NVIDIA_VISIBLE_DEVICES, like

root@gpu-31807-9f7846d58-lkd6c:/tf# env | grep -i devices -A 3 -B 3 LANG=C.UTF-8 TZ=Asia/Shanghai HOSTNAME=gpu-31807-9f7846d58-lkd6c NVIDIA_VISIBLE_DEVICES=GPU-de090b95-59f5-b4f4-3f42-7702633fadd7 GPU_31807_SERVICE_PORT_TCP_SSH_31707=22

but in the output of nvidia-smi, we still can see 2 gpu being allocated. only after i add following capability, then the output will show 1 as expected.

   securityContext:
      capabilities:
        add: ["SYS_RESOURCE"]

so does SYS_RESOURCE has some relations with the gpu resources?

majorinche avatar Jan 25 '22 08:01 majorinche

Is this in Kubernetes (that looks like part of a PodSpec yaml)? If so, this may help answer your question: https://docs.google.com/document/d/1zy0key-EL6JH50MZgwg96RPYxxXXnVUdxLZwGiyqLd8/edit

klueska avatar Jan 26 '22 07:01 klueska