docker-nvidia-glx-desktop
docker-nvidia-glx-desktop copied to clipboard
Run desktop on Windows Subsystem for Linux
thanks for these excellent resources
is the glx one working ok at the moment ? It seems with the latest docker and up to date card (3080TI) when running on a windows host with --gpus all or --gpus 1 there are some faults: on startup : /proc/driver/nvidia/version is no longer present so DRIVER_VERSION can't be found if you manually set DRIVER_VERSION to e.g. 510.47.03 - then the nvidia installer fails because the following files are preloaded into the container : libnvidia-ml.so.1, libcuda.so.1, libnvcuvid.so.1, libnvidia-encode.so.1, libnvidia-opticalflow.so.1
finally - if you hack all those so it runs then the nvidia-xconfig command currently there gives 'no screens found' (though I suspect at this point any xconfig command would do the same)
Hi,
You have stated that you are running a Windows host. Does this mean that you are using the Windows Subsystem for Linux, or actually starting a container on Windows? The latter will be impossible as this is a Linux container, and the earlier is untested but might be possible (I have no such setup).
then the nvidia installer fails because the following files are preloaded into the container : libnvidia-ml.so.1, libcuda.so.1, libnvcuvid.so.1, libnvidia-encode.so.1, libnvidia-opticalflow.so.1
This is not a big issue because the installation completes correctly even if this happens normally.
if you hack all those so it runs then the nvidia-xconfig command currently there gives 'no screens found' (though I suspect at this point any xconfig command would do the same)
This is unfortunately unavailable in containers, hence I have scripted a custom script in entrypoint.sh which uses nvidia-smi instead of nvidia-xconfig to configure Xorg.conf. It was worked around.
is the glx one working ok at the moment ? It seems with the latest docker and up to date card (3080TI) when running on a windows host with --gpus all or --gpus 1 there are some faults
Could you post the logs which are located in /tmp inside the container (docker exec) to troubleshoot further?
I was trying it as a linux container on windows (docker desktop in linux container mode, with WSL2 back end) - can you advise why that’s impossible (or whether it should work) ? I am also trying some things running docker-ce on an ubuntu WSL2 host which still has no screens found for xorg, but seems to get a better opengl result (with windows running x server instead)
Windows Subsystem for Linux might work. But the NVIDIA container runtime must be configured properly. However, this is a totally untested territory and I don't know where to start. I would really want to see this work though...
Could you post the full .log files in /tmp after starting the container then executing a container shell process with docker exec?
And also, please DO NOT start an X server with Windows Subsystem for Linux. The container has to start its own X server automatically instead.
Any luck? I was unable to prepare a setup using NVIDIA GPUs and WSL yet...
This will be addressed when possible. Please hold tight. Meanwhile, this use case is adequate for https://github.com/ehfd/docker-nvidia-egl-desktop.
If anyone succeeded in using either desktops with WSL, please share your experiences.
Ah, god. Did not have time.
It was in my personal to do list all the time, and I will try to look into it.
It's going to become a bit easier because I eliminated the CUDA runtime in the containers.
I am also interested in using this with WSL v2, as of now It doesn't seem to be able to connect to a screen. NoVNC works but has an awful framerate.
This is like the hundredth time I said this and I know that I'm going to procrastinate again, but I'll try my best to make it work.
@ehfd The EGL docker work like charm 😀
I'm going to do some tests to see if I can make it work with Xorg instead of Xvfb (I understand that this is what really differentiates between one and the other). But if there is any other way that can help in your attempts, I am at your disposal.
Could someone post their:
nvidia-smi --version
nvidia-smi
nvidia-smi --query-gpu=driver_version --format=csv,noheader
Outputs inside WSL as soon as possible?
# Install NVIDIA userspace driver components including X graphic libraries
if ! command -v nvidia-xconfig >/dev/null 2>&1; then
# Driver version is provided by the kernel through the container toolkit
export DRIVER_ARCH="$(dpkg --print-architecture | sed -e 's/arm64/aarch64/' -e 's/armhf/32bit-ARM/' -e 's/i.*86/x86/' -e 's/amd64/x86_64/' -e 's/unknown/x86_64/')"
if [ -z "${DRIVER_VERSION}" ]; then
# If kernel driver version is available, prioritize first
if [ -f "/proc/driver/nvidia/version" ]; then
export DRIVER_VERSION="$(head -n1 </proc/driver/nvidia/version | awk '{for(i=1;i<=NF;i++) if ($i ~ /^[0-9]+\.[0-9\.]+/) {print $i; exit}}')"
# Otherwise, use the NVML version for compatibility with Windows Subsystem for Linux
elif command -v nvidia-smi >/dev/null 2>&1; then
export DRIVER_VERSION="$(nvidia-smi --version | grep 'NVML version' | cut -d: -f2 | tr -d ' ')"
else
echo "Failed to find NVIDIA GPU driver version. You might not be using the NVIDIA container toolkit. Exiting."
exit 1
fi
fi
I've edited to use the NVML version (= the userspace library version) when the kernel driver version is unavailable. Hopefully, this might fix the driver installation with WSL. Please test.
Hi
$ nvidia-smi --version
NVIDIA-SMI version : 555.42.03
NVML version : 555.42
DRIVER version : 555.85
CUDA Version : 12.5
$ nvidia-smi
Sun Jun 23 14:54:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti On | 00000000:09:00.0 On | N/A |
| 53% 46C P0 89W / 350W | 2710MiB / 12288MiB | 17% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 401 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
555.85
...:~$ nvidia-smi --version
NVIDIA-SMI version : 550.76.01
NVML version : 550.76
DRIVER version : 552.22
CUDA Version : 12.4
...:~$ nvidia-smi
Sun Jun 23 21:43:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
Segmentation fault
...:~$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
552.22
...:~$ nvidia-smi
Mon Jun 24 15:05:49 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
Segmentation fault
Windows:
nvidia-smi
Mon Jun 24 15:08:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 3000 Ada Gene... WDDM | 00000000:01:00.0 Off | Off |
| N/A 51C P3 10W / 43W | 0MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Looks like the issue is very complicated. 555.42.03 isn't even in https://download.nvidia.com/XFree86/Linux-x86_64/ and NVML cuts the last digits off.
https://github.com/NVIDIA/nvidia-container-toolkit/issues/563
I feel there isn't much I can do here right now. Opened an issue for the root cause.
docker-nvidia-egl-desktop will still work.
I've enabled NVIDIA_DRIVER_VERSION, and things could possibly work out if you set the NVIDIA_DRIVER_VERSION to xxx.xx.01 down what it shows on nvidia-smi (such as if nvidia-smi is 550.76.01, set to 550.76, if nvidia-smi is 555.42.03, set to 555.42.02).
However, working behavior is not guaranteed.