nvidia-docker
nvidia-docker copied to clipboard
gpu allocate wrong
in container, we already has an env setting for NVIDIA_VISIBLE_DEVICES, like
root@gpu-31807-9f7846d58-lkd6c:/tf# env | grep -i devices -A 3 -B 3 LANG=C.UTF-8 TZ=Asia/Shanghai HOSTNAME=gpu-31807-9f7846d58-lkd6c NVIDIA_VISIBLE_DEVICES=GPU-de090b95-59f5-b4f4-3f42-7702633fadd7 GPU_31807_SERVICE_PORT_TCP_SSH_31707=22
but in the output of nvidia-smi, we still can see 2 gpu being allocated. only after i add following capability, then the output will show 1 as expected.
securityContext:
capabilities:
add: ["SYS_RESOURCE"]
so does SYS_RESOURCE has some relations with the gpu resources?
Is this in Kubernetes (that looks like part of a PodSpec yaml)? If so, this may help answer your question: https://docs.google.com/document/d/1zy0key-EL6JH50MZgwg96RPYxxXXnVUdxLZwGiyqLd8/edit