isaac_ros_visual_slam icon indicating copy to clipboard operation
isaac_ros_visual_slam copied to clipboard

Docker Error - Unknown or Invalid Runtime Name: Nvidia

Open YuminosukeSato opened this issue 2 years ago • 17 comments
trafficstars

I am encountering a runtime error with Docker when trying to use the Nvidia runtime. This issue arises despite having a successful output with an initial Docker command and making subsequent edits to the Docker configuration.

Steps to Reproduce

  1. Run the following Docker command which executes successfully:

    sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
    
  2. Edit /etc/docker/daemon.json as follows:

    {
        "runtimes": {
            "nvidia": {
                "path": "nvidia-container-runtime",
                "runtimeArgs": []
            }
        },
        "default-runtime": "nvidia"
    }
    
  3. After making these changes, attempt to execute a script with the command:

    scripts/run_dev.sh ~/workspaces/isaac_ros-dev/
    

    This results in the following error:

    docker: Error response from daemon: unknown or invalid runtime name: nvidia.
    

Expected Behavior

The Docker container should recognize the Nvidia runtime without errors, especially since the initial command runs without issues.

Actual Behavior

The system throws an error stating "unknown or invalid runtime name: nvidia" when trying to run a script that utilizes Docker with the Nvidia runtime.

Environment

  • Docker version: Docker version 24.0.7, build afdd53b
  • Operating System:ubuntu22.04
  • Any other relevant environmental details

Attempts to Resolve

  • Verified that the initial Docker command runs successfully.
  • Checked the syntax and paths in the daemon.json file.
  • Searched for similar issues in forums and GitHub Issues.

Request for Help

Could anyone provide insights or suggest potential solutions to resolve this runtime error? Any advice or guidance would be greatly appreciated.

YuminosukeSato avatar Nov 15 '23 05:11 YuminosukeSato

It looks like you may not have nvidia-container-toolkit installed. See here for instructions on how to install on your x86_64 system running Jammy.

hemalshahNV avatar Nov 17 '23 01:11 hemalshahNV

We have installed nvidia-container-toolkit and then started docker, but we get this error.

YuminosukeSato avatar Nov 17 '23 05:11 YuminosukeSato

I am experiencing same issue, nvidia-container-toolkit is also installed.

solix avatar Nov 17 '23 08:11 solix

I am experiencing same issue, nvidia-container-toolkit is also installed.

weirdsim14 avatar Nov 22 '23 03:11 weirdsim14

We're looking into this but haven't been able to reproduce this yet with the same OS and Docker version. We're still running a few more experiments on freshly provisioned machines to see if we can narrow it down.

Our theory is that setup instructions in nvidia-container-toolkit is different than what our machine provisioning scripts do (listed below):

# Install Nvidia Docker runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-runtime
sudo systemctl restart docker

sudo gpasswd -a $USER docker
sudo usermod -a -G docker $(whoami)
newgrp docker

hemalshahNV avatar Nov 30 '23 04:11 hemalshahNV

Hi,

is there any update regarding this issue? I'm experiencing the same on Ubuntu 22.04, Docker v4.30.0

mrlreable avatar May 24 '24 14:05 mrlreable

was facing the same issue... SOLVED by following these steps below

Editing the file /etc/docker/daemon.json to include:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

and then running:

sudo systemctl daemon-reload
sudo systemctl restart docker

The error stops showing and we are able to see the GPUs inside the containers when we run:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

prior to all this, we followed this tutorial (NVIDIA container toolkit instructions). Yet, it did not require to edit the file, as described above.

sid-isq avatar May 28 '24 10:05 sid-isq

The previous solution did not solve my problem. My original daemon.json was:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

I changed for the above one and did not solve. I already installed nvidia-container-toolkit. I am using Ubuntu 22.04.3 LTS.

EmanuelCastanho avatar May 28 '24 11:05 EmanuelCastanho

Happened to run across this thread, so will give my experience:

I had the same problem a couple weeks ago, also with Ubuntu 22.04. I had docker installed via snap, and that caused some of the paths to be different than what the Nvidia tools expect. I'm sure it should be fixable for the snap installation as well, but for me the easiest solution was to remove docker entirely and re-install it via apt-get as instructed here in docker guides. I tried to make it work with the snap version but quickly ran out of patience and decided to just reinstall docker entirely.

So if you haven't already, you might want to check how your docker is installed.

tanelikor avatar May 28 '24 12:05 tanelikor

Happened to run across this thread, so will give my experience:

I had the same problem a couple weeks ago, also with Ubuntu 22.04. I had docker installed via snap, and that caused some of the paths to be different than what the Nvidia tools expect. I'm sure it should be fixable for the snap installation as well, but for me the easiest solution was to remove docker entirely and re-install it via apt-get as instructed here in docker guides. I tried to make it work with the snap version but quickly ran out of patience and decided to just reinstall docker entirely.

So if you haven't already, you might want to check how your docker is installed.

This solution worked for me. Thank you very very much. I had docker installed in host machine (windows docker desktop). I did not need to uninstall it, but instead I installed docker in WSL Ubuntu and kept docker desktop closed so WSL wouldn't use it, this solved the nvidia runtime problem. Actually I don't know if having both docker installed will have other side effects yet but it would be solved having docker installed in WSL only.

caio-swdev avatar Jul 17 '24 14:07 caio-swdev

I faced the same problem with the installed NVIDIA container toolkit. It helped me to reconfigure the runtime sudo nvidia-ctk runtime configure --runtime=docker and then restart the daemon: sudo systemctl restart docker

It worked for me. Manual editing of /etc/docker/daemon.json and then restarting the daemon didn't, for some reason.

evstigneevnm avatar Jul 26 '24 10:07 evstigneevnm

I encountered the same problem and successfully solved it. My environment is "Win10 + wsl + Docker Desktop". The daemon.json file is not only exists in the /etc/docker/daemon. json but also in the Windows host directory : C:\Users\XXX\.docker\daemon.json.

So, Editing the file C:\Users\XXX.docker\daemon.json to include:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

and then running in wsl :

sudo systemctl daemon-reload
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

I hope this can help you.

hipop avatar Aug 08 '24 07:08 hipop

I get the corner case repeated several times:

env:

  • Ubuntu 24.04
  • Docker (CE): Docker version 27.0.2, build 912c1dd
  • CUDA: 12.x
  • Toolkit: 1_1.13.5-1
  • Installation: install-guide

Bad case

If one enables non root user to use docker like this:

sudo groupadd docker
sudo gpasswd -a $USER docker

and manually config the /etc/docker/daemon.json like this:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
  1. After sudo systemctl restart docker. One may
  • Get the right feedback with sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi;
  • Get docker: Error response from daemon: unknown or invalid runtime name: nvidia. error message with docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi (non root user).
  1. If one follows Rootless mode session in install-guide
  • If one uses sudo groupadd docker & sudo gpasswd -a $USER docker to setup the roorkess mode.
    • Then the first step will fail (there should be no $HOME/.config/docker/daemon.json or $HOME/.config/docker).
    • One should set up $HOME/.config/docker/daemon.json manually (well, copy /etc/docker/daemon.json might work);
    • After this, one may get the following error message when execute sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place:
No help topic for 'config'

HowTO Fix it

  1. Check your rootless configuration file. The manual uses $HOME/.config/docker/daemon.json, but you may find it in another place, saying $HOME/.docker/daemon.json. In my env setting, it is in the latter one.
  2. flush the dockerd rootless: systemctl --user restart docker
  3. If sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place fails, just add no-cgroups = true in the [nvidia-container-cli] session in /etc/nvidia-container-runtime/config.toml:
[nvidia-container-cli]
# some .....
no-cgroups = true

⚠️ Fail to set up Step 3, leads to some legacy OCI runtime error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.

Hints for potential reasons

  1. sudo docker vs docker use different Docker Root Dir and configurations;
  2. For rootless users, the configuration file path may be different than the install-guide.
  3. The nvidia-ctk misbehaves, users should adjust the /etc/nvidia-container-runtime/config.toml manually.

yfyang86 avatar Aug 12 '24 14:08 yfyang86