k8s-device-plugin Requesting zero GPUs allocates all GPUs

The README.md states:

WARNING: if you don't request GPUs when using the device plugin with NVIDIA images all the GPUs on the machine will be exposed inside your container.

I discovered a workaround for this, which is to set the environment variable NVIDIA_VISIBLE_DEVICES to none in the container spec.

With a resource request for nvidia.com/gpu: 0 this environment variable should be set automatically.

Jul 10 '18 11:07 dhague

With a resource request for nvidia.com/gpu: 0 this environment variable should be set automatically.

currently, device plugin doesn't have the ability to inject env vars to pods.

However, You can implement this feature with Admission Webhook. Just writing small web server to mutate the env var to user definitions. I think it's not so difficult. (Actually, I did the same thing in our cluster.)

Aug 17 '18 05:08 everpeace

Does it mean if I have two containers both requesting for nvidia.com/gpu: 0, then they could share the GPU?

Oct 13 '18 05:10 yukuo78

@yukuo78 basically yes, this is equal to the node-selector trick to share GPUs, as described in: https://github.com/kubernetes/kubernetes/issues/52757#issuecomment-410419952

go check the follow-ups on the thread for more information.

Oct 13 '18 13:10 thomasjungblut

@dhague it is prerequisite for both nvidia.com/gpu: 0 and NVIDIA_VISIBLE_DEVICES env var should be set together, isn't it? recently， if only nvidia.com/gpu: 0 is set , related pod scheduled on GPU node probably crash resulting from "OutOfnvidia.com/gpu", its status specially looks like

status:
  message: 'Pod Node didn''t have enough resource: nvidia.com/gpu, requested: 0, used:
    1, capacity: 0'
  phase: Failed
  reason: OutOfnvidia.com/gpu
  startTime: "2019-05-09T03:05:49Z"

May 17 '19 03:05 Davidrjx

@everpeace could you share your custom Admission Webhook code, especially the part that mutates NVIDIA_VISIBLE_DEVICES ?

Aug 14 '19 14:08 aebischers

I 've tested. If we add NVIDIA_VISIBLE_DEVICES=none to pod.spec.containers[*].env, then a pod which want to use 1 GPU by k8s's container resource requests, the envionment list when nvidia-container-runtime was executed will be(the order is important):

NVIDIA_VISIBLE_DEVICES=GPU-xxx-xxx-xxx-xxx-xxx
NVIDIA_VISIBLE_DEVICES=none

And The nvidia-container-runtime may take the last one to decide which devices to mount, then will result in no devices avaiable in container which is not expected. This action is up to the version of nvidia-container-runtime-hook(renamed to nvidia-container-toolkit recently) you use, please referer to this)

Nov 02 '19 03:11 Cas-pian

@Cas-pian what you meant was two pods setup respectively with NVIDIA_VISIBLE_DEVICES=none and nvidia.com/gpu:1 on same node, wasn't that?

Nov 05 '19 13:11 Davidrjx

@Cas-pian what you meant was two pods setup respectively with NVIDIA_VISIBLE_DEVICES=none and nvidia.com/gpu:1 on same node, wasn't that?

Nov 05 '19 13:11 Davidrjx

@Davidrjx no, just I found a bug of nvidia-container-runtime-hook(nvidia-container-tolkit): multi NVIDIA_VISIBLE_DEVICES envs which is not handled, and will make GPUs not mounted as expected.

step 1: I use a cuda image has env NVIDIA_VISIBLE_DEVICES=all to start a pod (without setting resources.requests for GPU), then all GPUs will be mounted into the container, this will make k8s-device-plugin useless, and break the environment of pod who use resources.requests for GPU.

step 2: In order to fix the problem in step 1, I add NVIDIA_VISIBLE_DEVICES=none to pod.spec.containers[*].env to disable the default value of NVIDIA_VISIBLE_DEVICES in image, but what I saw is no GPU was mounted into the pod even I use resource.requests to use GPU, even if you use resources.requests!!

And finally I found it's not a good design to use the same logic(env NVIDIA_VISIBLE_DEVICES) for single node GPU allocation and cluster GPU allocation, because CUDA images are made for single node usage, it's better to use a diffenent logics(eg: different envs). @flx42

Nov 06 '19 09:11 Cas-pian

@Cas-pian oh, now i understand what you mean.

Nov 06 '19 10:11 Davidrjx

I wrote a Kubernetes Mutating Admission Webhook called gpu-admission-webhook to handle this case. It sets NVIDIA_VISIBLE_DEVICES to "none" if you do not request a GPU. It also deletes environment variables that would cause issues or bypass this constraint.

May 12 '20 16:05 ktarplee

After reading the documentation about NVIDIA_VISIBLE_DEVICES I advices you to set void instead of none.

From the doc:

nvidia-container-runtime will have the same behavior as runc (i.e. neither GPUs nor capabilities are exposed)

Mar 25 '21 13:03 XciD

I've tried to set:

        resources:
          limits:
            nvidia.com/gpu: 0

My idea is to have multiple pods on the same node sharing single GPU. But it looks like in such case, the app in container does not utilise GPU at all. What am I missing?

Jun 28 '21 11:06 orkenstein

This is no longer an issue if you have the following lines in your /etc/nvidia-container-runtime/config.toml

accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true

And you deploy the nvidia-device-plugin with the values

compatWithCPUManager: true
deviceListStrategy: volume-mounts

Jun 28 '21 11:06 ktarplee

@ktarplee thanks for the clue! Talking about /etc/nvidia-container-runtime/config.toml. I have a container build on top of tensorflow/tensorflow:1.14.0-gpu-py3 but see no config.toml. Or where it should be edited?

Jun 28 '21 11:06 orkenstein

Needs to be set on the host, not inside a container.

Here’s a link to the details: https://docs.google.com/document/d/1zy0key-EL6JH50MZgwg96RPYxxXXnVUdxLZwGiyqLd8/edit

Jun 28 '21 11:06 klueska

@orkenstein the config file mentioned is installed on every host along with the NVIDIA Container Toolkit / NVIDIA Docker.

Jun 28 '21 11:06 elezar

Thanks @klueska @elezar I'm not sure how to do that on GCloud. Should I tweak nvidia-installer somehow?

Jun 29 '21 11:06 orkenstein

@orkenstein does that mean that you're not using the NVIDIA Device Plugin to allow GPU usage on GCloud but using instead?

(could you provide a link to the nvidia-installer you mention).

Jun 29 '21 12:06 elezar

@elezar drivers gets installed like this: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers

Jun 29 '21 13:06 orkenstein

@orkenstein GKE does not (currently) use the NVIDIA device plugin nor the NVIDIA container toolkit. Which means that the suggestion by @ktarplee is not applicable to you.

Jun 29 '21 14:06 elezar

@orkenstein GKE does not (currently) use the NVIDIA device plugin nor the NVIDIA container toolkit. Which means that the suggestion by @ktarplee is not applicable to you.

Ah, okay. What should I do then?

Jun 29 '21 18:06 orkenstein

This is unfortunately not something that I can help with. You could try post your request here https://github.com/GoogleCloudPlatform/container-engine-accelerators/issues (which contains the device plugin used on GKE systems).

Jun 30 '21 08:06 elezar

This is no longer an issue if you have the following lines in your /etc/nvidia-container-runtime/config.toml
accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true 
And you deploy the nvidia-device-plugin with the values
compatWithCPUManager: true
deviceListStrategy: volume-mounts

Thanks for this solution. However, I'm deploying https://github.com/NVIDIA/gpu-operator to my k3s cluster with a docker backend, using gpu-operator to install the container runtime. Is it possible to inject this configuration into the helm deployment?

Nov 23 '21 23:11 sjdrc

@sjdrc Currently its not possible to set these parameters through gpu-operator helm deployment as the toolkit container doesn't support configuring these yet. We will look into adding this support. Meanwhile, these need to be added manually to /usr/local/nvidia/toolkit/.config/config.toml file, but device-plugin settings can be configured through --set devicePlugin.env[0].name=DEVICE_LIST_STRATEGY --set devicePlugin.env[0].value="volume-mounts" parameters during operator install. compatWithCPUManager setting is already default through gpu-operator deployment.

Nov 24 '21 00:11 shivamerla

Thanks for your prompt reply.

So just to clarify, I should configure /usr/local/nvidia/toolkit/.config/config.toml on the host, and by setting volume-mounts, the device plugin will use the host configuration?

I do not have that file, but I do have /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml

Nov 24 '21 01:11 sjdrc

Hey, I'm still having issues getting this working.

Should the config changes go into /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml - This file is present on my host, but not /usr/local/nvidia/toolkit/.config/config.toml
What section in the config file do these changes go? I have a top level section, [nvidia-container-cli], and [nvidia-container-runtime]
How can I make these persist? Every time I restart k3s the file content gets reverted.

Dec 03 '21 01:12 sjdrc

Adding a bit more information about my setup process (from clean)

Dec 03 '21 05:12 sjdrc

Hey, I'm still having issues getting this working.

Should the config changes go into /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml - This file is present on my host, but not /usr/local/nvidia/toolkit/.config/config.toml

Sorry, /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml is the right location of this file.

What section in the config file do these changes go? I have a top level section, [nvidia-container-cli], and [nvidia-container-runtime]

You need to add those lines as global params.

disable-require = false
accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true 

[nvidia-container-cli]
  environment = []
  ldconfig = "@/run/nvidia/driver/sbin/ldconfig.real"
  load-kmods = true
  path = "/usr/local/nvidia/toolkit/nvidia-container-cli"
  root = "/run/nvidia/driver"

[nvidia-container-runtime]

How can I make these persist? Every time I restart k3s the file content gets reverted.

I think this was because they were not added as global params.

Dec 03 '21 18:12 shivamerla

I'm still running into issues.

Steps to reproduce

Install ubuntu server 20.04
Install docker

curl https://get.docker.com | sh \
  && sudo systemctl --now enable docker

Blacklist nouveau

cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nvidia-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF

Disable apparmor sudo apt remove --assume-yes --purge apparmor
install k3s with --docker flag
helm install --version 1.9.0 --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set devicePlugin.env[0].name=DEVICE_LIST_STRATEGY --set devicePlugin.env[0].value="volume-mounts"
Add globally to /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml

accept-nvidia-visible-devices-envvar-when-unprivileged = false
accept-nvidia-visible-devices-as-volume-mounts = true

Reboot

Result

nvidia-device-plugin-validator is giving an error and refusing to start:

Error: failed to start container "plugin-validation": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: device error: /var/run/nvidia-container-devices: unknown device: unknown

Dec 06 '21 22:12 sjdrc

k8s-device-plugin k8s-device-plugin copied to clipboard

Requesting zero GPUs allocates all GPUs

Steps to reproduce

Result

k8s-device-plugin
k8s-device-plugin copied to clipboard