isaac_ros_visual_slam
isaac_ros_visual_slam copied to clipboard
Docker Error - Unknown or Invalid Runtime Name: Nvidia
I am encountering a runtime error with Docker when trying to use the Nvidia runtime. This issue arises despite having a successful output with an initial Docker command and making subsequent edits to the Docker configuration.
Steps to Reproduce
-
Run the following Docker command which executes successfully:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -
Edit
/etc/docker/daemon.jsonas follows:{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" } -
After making these changes, attempt to execute a script with the command:
scripts/run_dev.sh ~/workspaces/isaac_ros-dev/This results in the following error:
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
Expected Behavior
The Docker container should recognize the Nvidia runtime without errors, especially since the initial command runs without issues.
Actual Behavior
The system throws an error stating "unknown or invalid runtime name: nvidia" when trying to run a script that utilizes Docker with the Nvidia runtime.
Environment
- Docker version: Docker version 24.0.7, build afdd53b
- Operating System:ubuntu22.04
- Any other relevant environmental details
Attempts to Resolve
- Verified that the initial Docker command runs successfully.
- Checked the syntax and paths in the
daemon.jsonfile. - Searched for similar issues in forums and GitHub Issues.
Request for Help
Could anyone provide insights or suggest potential solutions to resolve this runtime error? Any advice or guidance would be greatly appreciated.
It looks like you may not have nvidia-container-toolkit installed. See here for instructions on how to install on your x86_64 system running Jammy.
We have installed nvidia-container-toolkit and then started docker, but we get this error.
I am experiencing same issue, nvidia-container-toolkit is also installed.
I am experiencing same issue, nvidia-container-toolkit is also installed.
We're looking into this but haven't been able to reproduce this yet with the same OS and Docker version. We're still running a few more experiments on freshly provisioned machines to see if we can narrow it down.
Our theory is that setup instructions in nvidia-container-toolkit is different than what our machine provisioning scripts do (listed below):
# Install Nvidia Docker runtime
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-runtime
sudo systemctl restart docker
sudo gpasswd -a $USER docker
sudo usermod -a -G docker $(whoami)
newgrp docker
Hi,
is there any update regarding this issue? I'm experiencing the same on Ubuntu 22.04, Docker v4.30.0
was facing the same issue... SOLVED by following these steps below
Editing the file /etc/docker/daemon.json to include:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
and then running:
sudo systemctl daemon-reload
sudo systemctl restart docker
The error stops showing and we are able to see the GPUs inside the containers when we run:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
prior to all this, we followed this tutorial (NVIDIA container toolkit instructions). Yet, it did not require to edit the file, as described above.
The previous solution did not solve my problem.
My original daemon.json was:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
I changed for the above one and did not solve. I already installed nvidia-container-toolkit. I am using Ubuntu 22.04.3 LTS.
Happened to run across this thread, so will give my experience:
I had the same problem a couple weeks ago, also with Ubuntu 22.04. I had docker installed via snap, and that caused some of the paths to be different than what the Nvidia tools expect. I'm sure it should be fixable for the snap installation as well, but for me the easiest solution was to remove docker entirely and re-install it via apt-get as instructed here in docker guides. I tried to make it work with the snap version but quickly ran out of patience and decided to just reinstall docker entirely.
So if you haven't already, you might want to check how your docker is installed.
Happened to run across this thread, so will give my experience:
I had the same problem a couple weeks ago, also with Ubuntu 22.04. I had docker installed via snap, and that caused some of the paths to be different than what the Nvidia tools expect. I'm sure it should be fixable for the snap installation as well, but for me the easiest solution was to remove docker entirely and re-install it via apt-get as instructed here in docker guides. I tried to make it work with the snap version but quickly ran out of patience and decided to just reinstall docker entirely.
So if you haven't already, you might want to check how your docker is installed.
This solution worked for me. Thank you very very much. I had docker installed in host machine (windows docker desktop). I did not need to uninstall it, but instead I installed docker in WSL Ubuntu and kept docker desktop closed so WSL wouldn't use it, this solved the nvidia runtime problem. Actually I don't know if having both docker installed will have other side effects yet but it would be solved having docker installed in WSL only.
I faced the same problem with the installed NVIDIA container toolkit. It helped me to reconfigure the runtime
sudo nvidia-ctk runtime configure --runtime=docker
and then restart the daemon:
sudo systemctl restart docker
It worked for me. Manual editing of /etc/docker/daemon.json and then restarting the daemon didn't, for some reason.
I encountered the same problem and successfully solved it.
My environment is "Win10 + wsl + Docker Desktop".
The daemon.json file is not only exists in the /etc/docker/daemon. json but also in the Windows host directory : C:\Users\XXX\.docker\daemon.json.
So, Editing the file C:\Users\XXX.docker\daemon.json to include:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
and then running in wsl :
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
I hope this can help you.
I get the corner case repeated several times:
env:
- Ubuntu 24.04
- Docker (CE): Docker version 27.0.2, build 912c1dd
- CUDA: 12.x
- Toolkit: 1_1.13.5-1
- Installation: install-guide
Bad case
If one enables non root user to use docker like this:
sudo groupadd docker
sudo gpasswd -a $USER docker
and manually config the /etc/docker/daemon.json like this:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
- After
sudo systemctl restart docker. One may
- Get the right feedback with
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi; - Get
docker: Error response from daemon: unknown or invalid runtime name: nvidia.error message withdocker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi(non root user).
- If one follows
Rootless modesession in install-guide
- If one uses
sudo groupadd docker & sudo gpasswd -a $USER dockerto setup theroorkessmode.- Then the first step will fail (there should be no
$HOME/.config/docker/daemon.jsonor$HOME/.config/docker). - One should set up
$HOME/.config/docker/daemon.jsonmanually (well, copy/etc/docker/daemon.jsonmight work); - After this, one may get the following error message when execute
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place:
- Then the first step will fail (there should be no
No help topic for 'config'
HowTO Fix it
- Check your rootless configuration file. The manual uses
$HOME/.config/docker/daemon.json, but you may find it in another place, saying$HOME/.docker/daemon.json. In my env setting, it is in the latter one. - flush the dockerd rootless:
systemctl --user restart docker - If
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-placefails, just addno-cgroups = truein the[nvidia-container-cli]session in/etc/nvidia-container-runtime/config.toml:
[nvidia-container-cli]
# some .....
no-cgroups = true
⚠️ Fail to set up Step 3, leads to some legacy OCI runtime error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.
Hints for potential reasons
sudo dockervsdockeruse differentDocker Root Dirand configurations;- For rootless users, the configuration file path may be different than the install-guide.
- The
nvidia-ctkmisbehaves, users should adjust the/etc/nvidia-container-runtime/config.tomlmanually.