[Bug] - NVIDIA UnRaid - Unable to recognize GPU
Existing Resources
- [x] Please search the existing issues for related problems
- [x] Consult the product documentation : Docs
- [x] Consult the FAQ : FAQ
- [x] Consult the Troubleshooting Guide : Guide
- [x] Reviewed existing training videos: Youtube
Describe the bug On UnRaid installation Wizard the GPU is not recognized.
I've updated the installation template including:
in extra parameters: --runtime=nvidia
as variable: NVIDIA_VISIBLE_DEVICES = all
as variable: NVIDIA_DRIVER_CAPABILITIES = all
and nvidia-smi works.
# docker exec kasm nvidia-smi
Wed Mar 20 11:59:36 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.07 Driver Version: 550.40.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro P400 Off | 00000000:AF:00.0 Off | N/A |
| 56% 54C P0 N/A / N/A | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
but drm_info don't include /dev/dri/card1 my GPU Dev
# docker exec kasm ./gpuinfo.sh`
{"/dev/dri/card0":"MGA G200 SE"}
I've tried to force this card during the installation process (with an hardcoded mod of this script that output: {"/dev/dri/card1":"NVIDIA P400"} and {"/dev/dri/card1":"Quadro P400"} ), but after installation was done I'm unable to start any workspace, I have the error:
error gathering device information while adding custom device "/dev/dri/renderD129": no such file or directory
Full log:
Error during Create request for Server(a89aa3ec-ede1-4152-8a43-1dc99cb1950b) : (Exception creating Kasm: Traceback (most recent call last):
File "docker/api/client.py", line 268, in _raise_for_status
File "requests/models.py", line 1021, in raise_for_status
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.44/containers/f683e85b8fb6c257831f3a664eac0adc36d1ccfcd8f63075d69f732c88c9765f/start
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "__init__.py", line 573, in post
File "provision.py", line 1871, in provision
File "provision.py", line 1863, in provision
File "docker/models/containers.py", line 818, in run
File "docker/models/containers.py", line 404, in start
File "docker/utils/decorators.py", line 19, in wrapped
File "docker/api/container.py", line 1111, in start
File "docker/api/client.py", line 270, in _raise_for_status
File "docker/errors.py", line 31, in create_api_error_from_http_exception
docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.44/containers/f683e85b8fb6c257831f3a664eac0adc36d1ccfcd8f63075d69f732c88c9765f/start: Internal Server Error ("error gathering device information while adding custom device "/dev/dri/renderD129": no such file or directory")
)
The device is not present in kasm_agent container as device:
# docker exec kasm_agent ls /dev/dri/card1
ls: cannot access '/dev/dri/card1': No such file or directory
# docker exec kasm_agent ls /dev/dri/renderD129
ls: cannot access '/dev/dri/renderD129': No such file or directory
But I can find it in proc:
# docker exec kasm_agent cat /proc/driver/nvidia/gpus/0000\:af\:00.0/information
Model: Quadro P400
IRQ: 304
GPU UUID: GPU-226266ed-48f0-0e03-4d64-780bc2e08ccb
Video BIOS: 86.07.8f.00.02
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:af:00.0
Device Minor: 0
GPU Excluded: No
To Reproduce Steps to reproduce the behavior:
- Add kasm App from UnRaid installation
- Open the select for GPU
- You will not find any NVIDIA Card
Expected behavior Be able to use nvidia card on kasm/UnRaid
Workspaces Version Version 1.15
Workspaces Installation Method UnRaid
Workspace Server Information (please provide the output of the following commands):
uname -a
Linux fe5d658a8112 6.1.74-Unraid #1 SMP PREEMPT_DYNAMIC Fri Feb 2 11:06:32 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
sudo docker info
Client: Docker Engine - Community
Version: 25.0.4
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.5.0
Path: /usr/local/lib/docker/cli-plugins/docker-compose
Server:
Containers: 9
Running: 8
Paused: 0
Stopped: 1
Images: 9
Server Version: 25.0.4
Storage Driver: fuse-overlayfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.74-Unraid
Operating System: Ubuntu 22.04.2 LTS (containerized)
OSType: linux
Architecture: x86_64
CPUs: 88
Total Memory: 251.5GiB
Name: fe5d658a8112
ID: d62537f3-97b0-482e-a489-4e00a573cd4c
Docker Root Dir: /opt/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
sudo docker ps | grep kasm
4feeb4b6b2cf kasmweb/nginx:1.25.3 "/docker-entrypoint.…" 15 hours ago Up 14 hours 80/tcp, 0.0.0.0:6333->6333/tcp kasm_proxy
261d67c5ccc3 kasmweb/agent:1.15.0 "/bin/sh -c '/usr/bi…" 15 hours ago Up 14 hours (healthy) 4444/tcp kasm_agent
ad3e62cd7871 kasmweb/share:1.15.0 "/bin/sh -c '/usr/bi…" 15 hours ago Up 14 hours (healthy) 8182/tcp kasm_share
b1f718129357 kasmweb/kasm-guac:1.15.0 "/dockerentrypoint.sh" 15 hours ago Up 16 seconds (health: starting) kasm_guac
6150582c13bb kasmweb/api:1.15.0 "/bin/sh -c '/usr/bi…" 15 hours ago Up 14 hours (healthy) 8080/tcp kasm_api
a95638e0e39a kasmweb/manager:1.15.0 "/bin/sh -c '/usr/bi…" 15 hours ago Up 14 hours (healthy) 8181/tcp kasm_manager
bdfc0ef3df36 redis:5-alpine "docker-entrypoint.s…" 15 hours ago Up 14 hours 6379/tcp kasm_redis
8436c39024bc postgres:12-alpine "docker-entrypoint.s…" 15 hours ago Up 14 hours (healthy) 5432/tcp kasm_db
Additional context I'd like to try to add this to my boot modprobe config:
cat /boot/config/modprobe.d/nvidia.conf
options nvidia-drm modeset=1
options nvidia-drm fbdev=1
but I need to shutdown the server and is not something I can do easily
I've fixed running this:
docker exec -ti kasm nvidia-ctk runtime configure --runtime=docker
docker restart kasm
and updating the Chrome Workspace in "Docker Run Config Override (JSON)"
with this configuration:
{
"device_requests": [
{
"capabilities": [
[
"gpu"
]
],
"count": -1,
"device_ids": null,
"driver": "",
"options": {}
}
],
"devices": [
"/dev/dri/card1:/dev/dri/card1:rwm",
"/dev/dri/renderD128:/dev/dri/renderD128:rwm"
],
"environment": {
"KASM_EGL_CARD": "/dev/dri/card1",
"KASM_RENDERD": "/dev/dri/renderD128"
},
"hostname": "kasm"
}
But I have a black screen and at least chrome doesn't starts
But I have a black screen and at least chrome doesn't starts
Remove your Docker run config with:
{ "environment": { "NVIDIA_DRIVER_CAPABILITIES": "all" } }
I think you had it - had to scrounge around to figure out what the issues were but step 1 is:
Add the variables to the container:
Variables:
NVIDIA_DRIVER_CAPABILITIES=all NVIDIA_VISIBLE_DEVICES=all (or GPUID on visible devices)
Argument: --runtime=nvidia
Command: docker exec -ti kasm nvidia-ctk runtime configure --runtime=docker (as long as container name is kasm - run it from the CLI of the host, or alternatively run nvidia-ctk runtime configure --runtime=docker within the container.
Set the docker json to:
{ "environment": { "NVIDIA_DRIVER_CAPABILITIES": "all" } }