habitat-sim icon indicating copy to clipboard operation
habitat-sim copied to clipboard

Update to Magnum with windowed EGL support; better static plugin linking for Python

Open mosra opened this issue 3 years ago • 9 comments

Motivation and Context

A bugreport I received last week resulted in a nice cleanup of platform-dependent GL context creation, conincidentally also fixing something we've been talking about last Thursday:

  • There's now a MAGNUM_TARGET_EGL option, replacing MAGNUM_TARGET_HEADLESS. It's because EGL is no longer just for headless (or GLES) use cases, but can now also be used to create GL context in the windowed GLFW (or SDL) apps.
  • As a consequence, it's no longer needed to switch between GLX (if we want the GUI viewer) and EGL (if we want GPU selection for the headless mode) on Linux, EGL can be used for both the GUI and the headless case. Which means there's no longer a need to rebuild everything when enabling/disabling the GUI viewer, which should also result in significant reduction of CI build times. Stability- and distro-support-wise, EGL used to be a "new, experimental and buggy" option few years ago, but I think it's safe to use nowadays. Testing welcome, of course, especially with older Ubuntu LTS versions.
  • The --headless option of setup.py thus now only enables or disables the GUI viewer, nothing else. It should probably get renamed to something better, but I didn't want to break everything for everyone, so I didn't. (I also forgot most of what I knew about Habitat's buildsystem, and I don't know anything about Conda or Python packaging, so I don't feel competent enough to do such changes.)
  • Nothing changes on macOS, there it's still CGL like it was before.

Apart from that, this introduces a new & cleaner way of linking static plugins to Magnum's binaries:

  • The StbTrueTypeFont plugin is now linked directly to the Magnum Python module via a new MAGNUM_PYTHON_BINDINGS_STATIC_PLUGINS option, resolving the text rendering TODO from #1853. Thus it's enough to just import magnum to have the plugin available from magnum.text.FontManager, no need to fiddle with any dlopen flags anymore.

  • In a similar spirit, the Basis Compression Tool could be replaced by adding just the following to src/cmake/dependencies.cmake, getting rid of the CMake 3.13 requirement:

    if(BUILD_BASIS_COMPRESSOR)
        set(MAGNUM_IMAGECONVERTER_STATIC_PLUGINS
            Magnum::AnyImageImporter
            Magnum::AnyImageConverter
            MagnumPlugins::StbImageImporter
            MagnumPlugins::StbImageConverter
            MagnumPlugins::BasisImageConverter CACHE STRING "" FORCE)
    endif()
    

    Let me know if you want that change as well -- I didn't do that to avoid breaking existing workflows, the magnum-imageconverter executable would no longer be in src/tools/imageconverter after that.

  • And, the same could be used to build an in-tree version of the magnum-scenecoverter utility equipped with all needed plugins so you can have the asset processing capabilities directly inside Habitat instead of using my prebuilt binaries. On the other hand, this would serve maybe a 1% use case, while adding an ever-growing list of large dependencies (some of which might not even be possible to build as CMake subprojects).

Regarding the glTF conversion / export I'm working, this brings some more bits (~2kLOC of additions for magnum-sceneconverter, in particular), but the final piece that brings everything together is still being cooked. But again, unless you'd like to have it integrated directly in Habitat as shown above, it'll get to you in a form of prebuilt binaries.

Finally, recently Magnum also received a contribution allowing parallel GL shader compilation. If used correctly, it allows the driver to spread shader compilation over more CPU cores instead of going sequentially. It's mainly useful on the web on Windows, where ANGLE performs time-consuming GLSL -> HLSL / DXIL transpiling. But it's not really a problem with native GL on Linux / macOS, so I only adapted the code to API deprecations related to this feature, without actually doing anything to enable async compilation.

How Has This Been Tested

I might have missed something in the build scripts and CI setups here, please point that out if something feels fishy. Oh and any additional testing is welcome, especially from people using the --headless builds. Thanks!

mosra avatar Sep 12 '22 18:09 mosra

Wow, this would remove the need for a lot of the CONDA binaries we need to build since we can actually remove --headless option. What about the GPU selection using CUDA_VISIBLE_DEVICES in CUDA mode, does that still work? Seems like it's not compiling at the moment.

Skylion007 avatar Sep 12 '22 19:09 Skylion007

It does, I just messed up a define :P

mosra avatar Sep 12 '22 19:09 mosra

Completely agree with @Skylion007! We should use this opportunity to simplify our multiple conda binaries.

dhruvbatra avatar Sep 12 '22 23:09 dhruvbatra

Before I merge, can you explicitly confirm that the GUI viewer, now running on EGL instead of GLX, works for you on Linux? This is quite a significant change and I don't want to be a build-breaker :)

mosra avatar Sep 13 '22 08:09 mosra

Before I merge, can you explicitly confirm that the GUI viewer, now running on EGL instead of GLX, works for you on Linux?

Confirm the C++ viewer works for me. However, I'm getting the following error with the python viewer:

Fatal glibc error: malloc assertion failure in _int_malloc: (unsigned long) (size) >= (unsigned long) (nb)

aclegg3 avatar Sep 13 '22 16:09 aclegg3

Fatal glibc error: malloc assertion failure in _int_malloc: (unsigned long) (size) >= (unsigned long) (nb)

This smells like corrupted memory. Some return_value_policy probably not handled correctly in the magnum-bindings perhaps (@mosra)?

Skylion007 avatar Sep 13 '22 18:09 Skylion007

@mosra Looked through the new python bindings, had some minor performance nits: https://github.com/mosra/magnum-bindings/pull/16

Skylion007 avatar Sep 13 '22 18:09 Skylion007

There isn't really anything in this PR that would introduce any new bindings executed in this case, so this looks rather strange. @aclegg3 if you change the new MAGNUM_TARGET_EGL option to be explicitly OFF in src/cmake/dependencies.cmake, does that fix the crash? If yes, then it might be the python side mixing up EGL and GLX for some reason, if not then I have no idea. A backtrace of the crash could help a lot if you can get it.

Unfortunately I won't be able to look into this myself until I have a non-broken Linux machine again (hopefully Monday?).

mosra avatar Sep 14 '22 19:09 mosra

Now that I can, I rebuilt all my everything with MAGNUM_TARGET_EGL enabled, and it worked for Python bindings too, no crashes -- at least in Magnum's examples. So there doesn't seem to be anything directly in the bindings that would cause a crash in relation to EGL. (I still have to check with Habitat itself, didn't manage to restore everything needed for it yet.)

If you have time, can you try with the disabled option and/or give me the crash backtrace? Cross-confirming if the python viewer crashes/works on a different machine would be useful as well (@Skylion007? thank you), to rule out system-specific issues.

mosra avatar Sep 19 '22 16:09 mosra

If you have time, can you try with the disabled option and/or give me the crash backtrace? Cross-confirming if the python viewer crashes/works on a different machine would be useful as well (@Skylion007? thank you), to rule out system-specific issues.

I re-tried this with the current version, no custom setup, and everything is working as expected now. :tada:

aclegg3 avatar Sep 27 '22 18:09 aclegg3

Great, let's merge this.

Skylion007 avatar Sep 27 '22 18:09 Skylion007

Oh? No crashes? EGL just working? That whole nightmare, just gone? Amazing!!

I'm too afraid to press the merge button myself, please do that for me :sweat_smile:

mosra avatar Sep 27 '22 19:09 mosra