nvidia-docker icon indicating copy to clipboard operation
nvidia-docker copied to clipboard

WSL2 + Docker + OpenGL + NVIDIA not working (uses llvmpipe)

Open riv-rbush opened this issue 4 years ago • 30 comments

Summary

I am running ROS GUI applications like RViz and Gazebo through a docker container on WSL2. The OpenGL renderer is not selecting my NVIDIA GTX 1050 card and uses llvmpipe (CPU) instead.

My system:

  • Windows 11 Beta Preview
  • Latest WSL2 kernel with Ubuntu 20.04
  • Latest Docker Desktop for Windows
  • Latest NVIDIA GPU driver for WSL2 CUDA support

Note "latest" refers to 07th October 2021 updates, I don't have versions numbers to hand

Steps taken to fix so far

The OpenGL renderer does find my NVIDIA card outside of a docker container on WSL2 (on the host). I have replicated the same issue after multiple reinstalls and using docker-ce instead of docker desktop. On a native Ubuntu 20.04 boot, the containers OpenGL renderer is correctly set to my NVIDIA card.

Expected Behaviour

RViz, Gazebo, GLXGears, glmark2 should all render with 3D hardware acceleration on the NVIDIA GPU.

riv-rbush avatar Oct 07 '21 09:10 riv-rbush

@robertjbush which image is being used? Note that for OpenGL capabilities, the NVIDIA_DRIVER_CAPABILITIES environment variable should include graphics or be set to all. See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities

elezar avatar Oct 07 '21 12:10 elezar

@elezar I believe I have tried those steps because:

  1. The image is a custom one, based on the ros-noetic image.
  2. The NVIDIA card is used for OpenGL rendering in the docker container on a native Ubuntu 20.04 install
  3. However it doesn't work with the same image on WSL2
  4. I have tried various NVIDIA images and running glxgears, glxinfo and glmark2
  5. NVIDIA_DRIVER_CAPABILITIES is set as you suggested

Are you, or anyone else within NVIDIA corporation, successfully running NVIDIA OpenGL rendering within a docker container on a WSL2 host?

riv-rbush avatar Oct 08 '21 07:10 riv-rbush

Hi @robertjbush thanks for the additional information.

It may be that @rboissel will be able to provide some additional insight here.

elezar avatar Oct 08 '21 08:10 elezar

One thing to note is that the graphics libraries are mounted from the host system, meaning that these need to be installed. Do glxgears, glxinfo, or glmark2 work in "native" WSL2 using the NVIDIA card?

Could you enable the debug option in the nvidia-contianer-cli section in the /etc/nvidia-container-runtime/config.toml file by uncommenting it.

The generated /var/log/nvidia-container-toolkit.log will contain information as to which libraries are not being located in this case.

elezar avatar Oct 08 '21 08:10 elezar

One thing to note is that the graphics libraries are mounted from the host system, meaning that these need to be installed. Do glxgears, glxinfo, or glmark2 work in "native" WSL2 using the NVIDIA card?

Yes they do.

I'll work on the second part of your post now.

riv-rbush avatar Oct 08 '21 08:10 riv-rbush

@elezar I'm not getting those logs. This is my config.toml:

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"

This is at the end of my Docker Desktop JSON file

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF

Some version info:

  • Microsoft Windows [Version 10.0.22000.194]
  • WSL2 Kernel 5.10.60.1
  • Docker Desktop (Windows): 4.1.0 (69386)

Can you provide any insight on this:

Are you, or anyone else within NVIDIA corporation, successfully running NVIDIA OpenGL rendering within a docker container on a WSL2 host?

riv-rbush avatar Oct 08 '21 09:10 riv-rbush

@robertjbush what command line are you using to launch the container? Since nvidia is not set as the default runtime in your docker config, you would need to specify the runtime:

docker run --rm -ti --runtime=nvidia <image> nvidia-smi

Alternatively, specifying the --gpus flag should also ensure that the nvidia-container-toolkit is used to make the required modifications to the container when it is created.

While looking for documentation w.r.t. WSL support, I also found: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#features-not-yet-supported which lists OpenGL-interop as unsupported.

elezar avatar Oct 08 '21 11:10 elezar

@elezar I've used the runtime and --gpus flag with no success.

While looking for documentation w.r.t. WSL support, I also found: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#features-not-yet-supported which lists OpenGL-interop as unsupported.

But OpenGL is used by so many applications. Why would this not be supported when it is available on the host?

riv-rbush avatar Oct 08 '21 12:10 riv-rbush

@elezar Is there a forum to request new features?

riv-rbush avatar Oct 13 '21 10:10 riv-rbush

@robertjbush let me ping someone to fine out where that limitation comes from as it may be related to WSL2 (although I recall reading that this now has better support for Linux graphics applications). If this is only due to the NVIDIA Container Toolkit I will create a ticket to track getting this added.

elezar avatar Oct 13 '21 10:10 elezar

@elezar WSL2 does indeed have better support for GPU graphics rendering. I can run OpenGL applications and use NVIDIA hardware to render them. But it isn't possible from a docker container when the host is WSL2 (the same container does use the NVIDIA GPU for rendering on a pure Ubuntu 20.04 install).

riv-rbush avatar Oct 13 '21 10:10 riv-rbush

I have pinged @rboissel to have a look at the ticket. He has a better grasp on the WSL2 specifics and where the noted limitations come from.

elezar avatar Oct 13 '21 11:10 elezar

@elezar @rboissel Good news in part:

  1. I've been testing accelerated OpenGL through containers in WSL2. I used the dockerfile from microsoft's recent commit ac6221b.
  2. I also managed to get RViz and ROS (robotic operating system) to use accelerated OpenGL.
  3. However, the meshes (STL's) do not display when using the nvidia drivers

Any ideas why this may happen?

riv-rbush avatar Oct 26 '21 11:10 riv-rbush

@robertjbush I'm having the same issue. GPU is working for compute in a docker container but not for OpenGL. I've tried environment variables such as LIBGL_ALWAYS_INDIRECT and NVIDIA_DRIVER_CAPABILITIES without success. I've also tried the dockerfiles from ac6221b. Were any other changes required to enable the GPU for graphics?

System Specs:

  • Windows 10 Pro, Version 21H2, Build 19044.1320, Windows Feature Experience Pack 120.2212.3920.0
  • WSL2: Linux FARWELL 5.10.16.3-microsoft-standard-WSL2 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Docker Desktop 4.1.1 (69879)
  • Windows Driver: 510.06_quadro_win11_win10-dch_64bit_international.exe

Thanks.

bejota avatar Oct 27 '21 21:10 bejota

@bejota OpenGL acceleration in WSL2 only works in Windows 11.

onomatopellan avatar Oct 27 '21 21:10 onomatopellan

Dang. I knew someone was going to say that.

bejota avatar Oct 27 '21 21:10 bejota

Anyone tested RViz and meshes using accelerated OpenGL in WSL?

riv-rbush avatar Nov 01 '21 08:11 riv-rbush

Anyone tested RViz and meshes using accelerated OpenGL in WSL?

I am facing the exact same problem except I am not running the ROS stuff (or Rviz) from a container (I hope this is still relevant therefore).

Like many people before me, I had the issue that the 3D rendering was not done by the GPU. That meant that Rviz got very slow once the models got bit bigger. However, at that time the meshes were displaying.

So I upgraded to Win11 and did all the necessary to force the 3D rendering on the GPU (Nvidia GTX 1050 Ti). The GPU now does the rendering, except the meshes do not get displayed. The frames from TF, on the other hand, do get displayed. image

tgaspar avatar Nov 02 '21 09:11 tgaspar

@tgaspar @elezar I have this exact problem.

riv-rbush avatar Nov 04 '21 14:11 riv-rbush

Friendly ping to anyone who's had this problem and solved it?

riv-rbush avatar Nov 18 '21 11:11 riv-rbush

Issue is being tracked in https://github.com/microsoft/wslg/issues/554

onomatopellan avatar Nov 18 '21 12:11 onomatopellan

@bejota OpenGL acceleration in WSL2 only works in Windows 11.

I'm on windows 11 but I am trying to run full hardware accelerated apps from Docker.

GPU (rtx2060 max Q) is working on docker containers for compute. But im sure GUI apps are not hardware accelerated in some way.

I am facing the same issue where things like webgl are not working because glrenderer is set to llvmpipe

glxgears outputs +600fps

WSL2 Ubuntu 20.04 glxinfo | grep OpenGL

OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 with Max-Q Design)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 21.0.3
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 21.0.3
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 21.0.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Docker container glxinfo | grep OpenGL

OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 128 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.3.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.3.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.3.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

This is my script to run gpu accelerated containers

docker run -it --rm --gpus 'all,"capabilities=compute,graphics,utility,video,display"' --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-e DISPLAY \
-e WAYLAND_DISPLAY \
-e XDG_RUNTIME_DIR \
-e PULSE_SERVER \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /mnt/wslg:/mnt/wslg \
-v $(pwd)/app:/app \
registry/image \
command

If I play 60FPS video on youtube chromium it plays good but sometimes choppy and GPU load for nvidia is going up only displaying its window on external monitor. I am pretty sure it is CPU rendering due to high CPU load when video playing.

Trying any webgl content reports the next error image

moracabanas avatar Nov 27 '21 07:11 moracabanas

@moracabanas I think you are missing these:

-e LD_LIBRARY_PATH=/usr/lib/wsl/lib
-v /usr/lib/wsl:/usr/lib/wsl

Take a look at the samples.

onomatopellan avatar Nov 27 '21 12:11 onomatopellan

@moracabanas I think you are missing these:

-e LD_LIBRARY_PATH=/usr/lib/wsl/lib
-v /usr/lib/wsl:/usr/lib/wsl

Take a look at the samples.

Thanks you for your suggestion. I tried the new configuration based on WLSG docker run ... examples you mentioned.

But I am still not getting OpenGL as glxinfo | grep OpenGL shows:

glxinfo | grep OpenGL
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 128 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.3.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.3.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.3.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Chrome is still showing the same unsupported and blacklisted WebGL

I tried Blender and it runs fine but you can feel there is no GPU acceleration at all

This is my image launcher script for testing now:

docker run -it --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /mnt/wslg:/mnt/wslg \
-v /usr/lib/wsl:/usr/lib/wsl \
--device=/dev/dxg \
-e LD_LIBRARY_PATH=/usr/lib/wsl/lib \
-e DISPLAY=$DISPLAY \
-e WAYLAND_DISPLAY=$WAYLAND_DISPLAY \
-e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
-e PULSE_SERVER=$PULSE_SERVER \
-v $(pwd)/app:/app \
<repo/image:tag> \
bash

moracabanas avatar Nov 28 '21 18:11 moracabanas

@moracabanas

Mesa 18.3.6

You also need to install Mesa 21.x inside the container.

onomatopellan avatar Nov 29 '21 01:11 onomatopellan

@moracabanas

Mesa 18.3.6

You also need to install Mesa 21.x inside the container.

I've been trying to install mesa for hours on somewhere other than Ubuntu distro and I give up.

Do you have any advice to update or install mesa I.E any docker image? I don't want it to compile because in my experience, compiling software from source takes a day, mostly with errors. And also I don't know what I am doing in the process except copy pasting scripts.

Things I've tried already:

 sudo add-apt-repository ppa:kisak/kisak-mesa
sudo apt update
sudo apt upgrade

This is not working as this repo only supports Ubuntu and has no candidate for my buster/bullseye Debian based docker image.

moracabanas avatar Nov 30 '21 14:11 moracabanas

@moracabanas On Debian bullseye you need to add the deb http://http.us.debian.org/debian/ testing non-free contrib main line to your /etc/apt/sources.list and run sudo apt update && sudo apt upgrade -y after that.

onomatopellan avatar Nov 30 '21 15:11 onomatopellan

I updated my image with that and now I get: glxinfo | grep OpenGL

OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 with Max-Q Design)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 21.2.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 21.2.5
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 21.2.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Thanks you so much I am testing this now!

moracabanas avatar Nov 30 '21 18:11 moracabanas

All working like expected right now Webgl is working solid! on my docker image

image

The weird issue now is about how I can get ~700fps on glxgears with llvmpipe and just ~70fps with mesa 21.x

moracabanas avatar Nov 30 '21 19:11 moracabanas

@moracabanas glxgears is somewhat outdated. Try better es2gears from the mesa-utils-extra package.

onomatopellan avatar Nov 30 '21 19:11 onomatopellan