Pangolin icon indicating copy to clipboard operation
Pangolin copied to clipboard

Replace GLX with EGL for X11

Open christian-rauch opened this issue 6 years ago • 30 comments

This PR replaces GLX by EGL for the X11 windowing and fixes potential issues like: https://github.com/stevenlovegrove/Pangolin/issues/74 https://github.com/stevenlovegrove/Pangolin/issues/242 https://github.com/stevenlovegrove/Pangolin/issues/260. Example programs like SimpleDisplay work already.

Before continuing working on this: Will replacing GLX by EGL be potentially merged? This will remove the GLX dependency but adds and dependency on EGL (which is also required by Android).

EDIT: With NVIDIA drivers > 384 (390, 396), the window content does not update and stays black.

Fixes https://github.com/stevenlovegrove/Pangolin/issues/782.

christian-rauch avatar Jul 23 '18 22:07 christian-rauch

This is great! If we can make this reliable enough, I would be very keen to merge. Especially if we can support the use case of a frame buffer without window.

stevenlovegrove avatar Aug 01 '18 05:08 stevenlovegrove

I posted the issue with the blank window with certain nvidia drivers with a minimal example at the nvidia forum: https://devtalk.nvidia.com/default/topic/1037901/x11-with-egl-produces-black-window-for-driver-versions-gt-384 . Since this issue also appears in the official mesa example, I somehow suspect that this is a driver issue. The other issue that shared EGL contexts are not supported. I.e. at the moment the program simply exits when trying to access a context from another thread. This might be related to the X11+GLX context sharing issue https://github.com/stevenlovegrove/Pangolin/issues/393.

When this works, going to windowless should be easy. We could even separate the EGL context creation into a class that is used by X11, Wayland and windowless.

christian-rauch avatar Aug 01 '18 10:08 christian-rauch

The X11+EGL window and context creation now works in a threaded environment. However, I needed to add an new method RemoveCurrent to the WindowInterface to unset the current context. When starting a new thread, you need to call BindToContext() to set the context in this thread, and call RemoveCurrent() to unset it. I added a new example HelloPangolinThreads that demonstrates the procedure. The threading patches also work on the previous X11+GLX version. I included them here because they are the only way I could make threading with X11+EGL working. But I can create another dedicated PR for the threading patches if requiered.

christian-rauch avatar Aug 06 '18 19:08 christian-rauch

So did you manage to work around the NVidia issue that you references?

stevenlovegrove avatar Aug 08 '18 06:08 stevenlovegrove

The nvidia driver issue is unrelated to Pangolin, it also appears for the minimal EGL example that I posted in the nvidia forum https://devtalk.nvidia.com/default/topic/1037901/x11-with-egl-produces-black-window-for-driver-versions-gt-384 .

It seems to only occur with the nvidia drivers > 384 from the official CUDA repository for Ubuntu. E.g. it is working on other distributions or using the nvidia installation script (see post by matthewcmatl in the linked nvidia issue). I have not tested this issue in many other configurations. It would be nice if someone with Ubuntu 18.04 could test if this occures with the drivers from the official package repo and the nvidia CUDA repo.

christian-rauch avatar Aug 08 '18 10:08 christian-rauch

Okay, I'm excited to merge this, but I guess I should wait until the Ubuntu package is fixed.

stevenlovegrove avatar Aug 12 '18 02:08 stevenlovegrove

I can confirm that the X11+EGL feature is working now with nvidia driver versions 390.48 (Ubuntu 18.04 repo) and 410.48 (CUDA repo).

christian-rauch avatar Sep 26 '18 10:09 christian-rauch

Are there any downsides to this implementation, or should I now consider this for merge?

stevenlovegrove avatar Sep 27 '18 04:09 stevenlovegrove

I am using this branch for quite a while now on Ubuntu 14.04 and 18.04 with Nvidia drivers from the CUDA repo. I think it would be good if this branch could be tested by others in different configurations to see if this issue with black windows still exists.

christian-rauch avatar Sep 27 '18 12:09 christian-rauch

@stevenlovegrove Any idea why the build server is using CMake version 3.12.4? The default on xenial (the CI setting) is version 3.5.1. Even bionic is using cmake 3.10.2.

The newer cmake seems to be incompatible with the OpenGL settings in xenial:

CMake Error at /usr/local/cmake-3.12.4/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find OpenGL (missing: OPENGL_opengl_LIBRARY)

christian-rauch avatar Dec 05 '18 17:12 christian-rauch

@stevenlovegrove The Nvidia driver issues are gone and they now play nicely with EGL. Do you want to merge this now?

christian-rauch avatar Jul 27 '21 14:07 christian-rauch

Do you have any interest in rebasing this on master after my big refactor?

stevenlovegrove avatar Aug 27 '21 03:08 stevenlovegrove

I have a branch refactor_x11_egl that rebases this on top of 1658218dfbe56a3994e081324802bb354b12742e. After #669 has been fixed, I will rebase this on master.

christian-rauch avatar Aug 30 '21 14:08 christian-rauch

@stevenlovegrove This works now on Intel and Nvidia. Can you have a look at this again?

christian-rauch avatar Apr 19 '22 00:04 christian-rauch

@stevenlovegrove I think this is ready. Can you check my PR?

christian-rauch avatar May 27 '22 09:05 christian-rauch

~~@stevenlovegrove Do you know why the GitHub CI does not launch for this PR (and others)? Do we need an on: [pull_request] in the workflow?~~

Never mind, I just added this to enable the CI on PRs.

christian-rauch avatar May 27 '22 09:05 christian-rauch

Great - I'll try to find a few people using Linux to test it. Volunteers appreciated!

stevenlovegrove avatar May 31 '22 23:05 stevenlovegrove

Great - I'll try to find a few people using Linux to test it. Volunteers appreciated!

Thanks. If possible, this should be tested on different GPU/driver configurations. I can usually only test on a single Nvidia and Intel GPU.

christian-rauch avatar Jun 01 '22 09:06 christian-rauch

Any luck finding guinea pigs for testing this? :-)

christian-rauch avatar Jun 30 '22 19:06 christian-rauch

Not yet! Maybe @strasdat or @mp3guy ?

stevenlovegrove avatar Aug 23 '22 19:08 stevenlovegrove

@christian-rauch, @stevenlovegrove I just build it okay on my machine with NVIDIA gpu. What tests do I need to run?

strasdat avatar Aug 23 '22 21:08 strasdat

HelloPangolin and SimplePlot work as expected...

strasdat avatar Aug 23 '22 21:08 strasdat

@christian-rauch, @stevenlovegrove I just build it okay on my machine with NVIDIA gpu. What tests do I need to run?

Can you run the demos and check if they render correctly? Maybe also do some window managing things like resizing, maximising, and minimisng.

christian-rauch avatar Aug 23 '22 21:08 christian-rauch

Finally got a chance to try out this branch on Ubuntu 22.04 with NVidia 515 driver and I get a segfault on close for every sample. Stacktrace looks like this each time:

1   ??                                                                                                                                                                                         0x7ffff61c0550 
2   XCloseDisplay                                                                                                                                                                              0x7ffff713b852 
3   pangolin::X11Display::~X11Display                                                                                                                                   X11Window.h       53   0x7ffff76de21d 
4   __gnu_cxx::new_allocator<pangolin::X11Display>::destroy<pangolin::X11Display>                                                                                       new_allocator.h   168  0x7ffff76f494e 
5   std::allocator_traits<std::allocator<pangolin::X11Display>>::destroy<pangolin::X11Display>                                                                          alloc_traits.h    535  0x7ffff76f3bd1 
6   std::_Sp_counted_ptr_inplace<pangolin::X11Display, std::allocator<pangolin::X11Display>, (__gnu_cxx::_Lock_policy)2>::_M_dispose                                    shared_ptr_base.h 528  0x7ffff76f2e45 
7   std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release                                                                                                       shared_ptr_base.h 168  0x7ffff7ecb642 
8   std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count                                                                                                    shared_ptr_base.h 705  0x7ffff7ec85eb 
9   std::__shared_ptr<pangolin::X11Display, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr                                                                                  shared_ptr_base.h 1154 0x7ffff76dea80 
10  std::shared_ptr<pangolin::X11Display>::~shared_ptr                                                                                                                  shared_ptr.h      122  0x7ffff76deaca 
11  pangolin::X11Window::~X11Window                                                                                                                                     display_x11.cpp   333  0x7ffff76dab11 
12  pangolin::X11Window::~X11Window                                                                                                                                     display_x11.cpp   333  0x7ffff76dab40 
13  std::default_delete<pangolin::WindowInterface>::operator()                                                                                                          unique_ptr.h      85   0x7ffff7ecc5a6 
14  std::_Sp_counted_deleter<pangolin::WindowInterface *, std::default_delete<pangolin::WindowInterface>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose shared_ptr_base.h 442  0x7ffff7ef271e 
15  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release                                                                                                       shared_ptr_base.h 168  0x7ffff7ecb642 
16  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count                                                                                                    shared_ptr_base.h 705  0x7ffff7ec85eb 
17  std::__shared_ptr<pangolin::WindowInterface, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr                                                                             shared_ptr_base.h 1154 0x7ffff7ecc5dc 
18  std::shared_ptr<pangolin::WindowInterface>::~shared_ptr                                                                                                             shared_ptr.h      122  0x7ffff7ef4676 
19  pangolin::PangolinGl::~PangolinGl                                                                                                                                   pangolin_gl.cpp   47   0x7ffff7ef40dc 
20  __gnu_cxx::new_allocator<pangolin::PangolinGl>::destroy<pangolin::PangolinGl>                                                                                       new_allocator.h   168  0x7ffff7ef30f8 
... <More>                                                                                                                                                                                                    

Basically, it doesn't like that X11 Display handle. Does this happen for you too? I verified that this doesn't happen on master.

Adding a breakpoint, I can verify that the handle isn't getting closed twice - it segfaults the first time through that line.

    ~X11Display() {
        XCloseDisplay(display);
    }

Any ideas?

stevenlovegrove avatar Sep 23 '22 07:09 stevenlovegrove

Any ideas?

Since you are on Ubuntu 22.04 and this is the first Ubuntu using Wayland by default, is this running on "native" X11 / Xorg or via XWayland?

I cannot reproduce this on Ubuntu 20.04 with NVIDIA 515.65.01 on native X11 / Xorg.

christian-rauch avatar Sep 23 '22 07:09 christian-rauch

Hmmm. I dist-upgraded from 18 -> 20, and then 20 -> 22 so I probably have a lot of cruft around. I have a bunch of x11 libs installed like x11 common. Not sure how I would know if I'm using 'native' x11 or not? I just used what Pangolin found, but perhaps it found something odd. I'll see if I can take another look tomorrow. Thanks!

stevenlovegrove avatar Sep 23 '22 08:09 stevenlovegrove

Not sure how I would know if I'm using 'native' x11 or not?

Check "Settings" -> "About" -> "Windowing System". This will say "X11" when you use X11 natively, or "Wayland" when you run Wayland (the default since 22.04). In the latter case, X11 clients will use "XWayland" on Wayland.

It may take a while until I can test Ubuntu 22.04 on Nvidia hardware. Would be good to get more feedback on this configuration.

christian-rauch avatar Sep 26 '22 11:09 christian-rauch

Check "Settings" -> "About" -> "Windowing System". This will say "X11" when you use X11 natively, or "Wayland" when you run Wayland (the default since 22.04). In the latter case, X11 clients will use "XWayland" on Wayland.

It says 'X11', so native. Checking the version, I have:

> xdpyinfo | grep version
version number: 11.0
X.Org version: 1.21.1.3

Haven't had time to look into it further yet.

stevenlovegrove avatar Sep 28 '22 03:09 stevenlovegrove

Hi all, I tried this PR in a Ubuntu 22.04 docker with X11. HelloPangolin works as expected.

This PR helps me to solve my issue (libtorch+Pangolin segfault, see #884). Thank you very much.

The only thing I think is a little bit verbose is that I have to link explicitly OpenGL::EGL in CMakeLists.txt. I expect to link it implicitly through linking to pango_opengl, as for ${GLEW_LIBRARY}. See

https://github.com/christian-rauch/Pangolin/blob/5bf16e811704f98e12393f2a4ac5f5918e1f8fff/components/pango_opengl/CMakeLists.txt#L57C51-L57C66

Here is my current CMakeLists.txt

find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL)
target_link_libraries(pangolin_libtorch_test
  PUBLIC
  "${TORCH_LIBRARIES}"
  ${Pangolin_LIBRARIES}
  OpenGL::EGL)

The EGL is explicitly added here. If I don't do this, I will have this error:

CMake Error at CMakeLists.txt:33 (add_executable):
  Target "pangolin_libtorch_test" links to target "OpenGL::EGL" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?

fwcore avatar Jul 25 '23 08:07 fwcore

Thanks to this MR, save me from OpenGL errors

EXing avatar Mar 19 '24 11:03 EXing