jmonkeyengine icon indicating copy to clipboard operation
jmonkeyengine copied to clipboard

Linux: Native Crash lwjgl2 OpenJDK 11.0.5

Open MeFisto94 opened this issue 5 years ago • 10 comments

So, originally this has been discovered by @jayfella but I don't think we've been tracking this officially already, thus:

Just trying to run the most simple example (here app.start()) using lwjgl2 fails as follows:

Jan. 11, 2020 3:11:12 NACHM. com.jme3.system.JmeDesktopSystem initialize
INFORMATION: Running on jMonkeyEngine 3.3.0-beta1
 * Branch: HEAD
 * Git Hash: bd1b6d2
 * Build Date: 2019-12-22
Inconsistency detected by ld.so: dl-lookup.c: 111: check_match: Assertion `version->filename == NULL || ! _dl_name_match_p (version->filename, map)' failed!

However opposing to the claim this should be working with 11.0.4, I've seen reports about 11.0.3 here.

Either way this is unfortunate as 11.0.5 is the one available and I think will replace Java 8 soon. I am not sure if we can do anything about it, but it should be tracked so people who encounter this know it.

Edit: https://github.com/lattera/glibc/blob/master/elf/dl-lookup.c#L111 this is why it fails, and I am uncertain how openJDK could break that other than by changing the way they invoke glibc to load symbols.

Can anyone manage to find a tracking bug for openJDK? Maybe they don't know it as everyone is just downgrading to openJDK8?

MeFisto94 avatar Jan 11 '20 14:01 MeFisto94

@stephengold @jayfella @pspeed42 @riccardobl and who else it may concern:

My Investigation is complete, so here's a "quick" wrap-up for you guys: LWJGL2 crashes on Ubuntu's JVM 9-11+ only. It is a native crash. Installing any other JVM (e.g. AdoptOpenJDK) works. Due to Ubuntu's popularity it's actually a rather urgent issue.

Technical Details

LWJGL2 was compiled against Java 8 or lower JNI Libraries. It links only against `libjawt`, which contains the code to load up AWT. In Java This is where this story mostly ends for the OpenJDK Guys. Their recommendation is to just recompile LWJGL2 against a more recent java.

Now why does this only affect Ubuntu and why does the jvm crash instead of throw, say, an UnsatisfiedLinkException? The reason is that this is coupled with another bug: On AdoptOpenJDK the missing version symbol is ignored and the method is still loaded. On Ubuntu, libc (the "implementation" of the C Language) asserts with the above message.

Why? For AdoptOpenJDK, Debian, etc. the libjawt.so still contains plenty of Version Symbols, namely GLIBC_2.5.5 and others. Ubuntu seems to have a different approach and strips this symbols to have smaller files.

This means that only on Ubuntu, libjawt.so doesn't have the version symbol table at all, which is why I suspect that libc asserts. It tries to load a versioned dependency from a file not having any version symbols at all.

I've reported this to the libc guys in February but didn't receive any solution since then. I did poke the guys once and basically the response was that they didn't have the time/motivation yet to extract the Ubuntu JVM and lwjgl to inspect the behavior and that I should provide a self-contained test case.

When doing so, I could debug the problem myself. But I didn't want to go to the hassle and re-create the jvm's call stack, building libc in debug mode or whatever would've been required for this. I've also figured that a patch to libc would take at least a year until this is fixed on Ubuntu.

Furthermore what we're doing there is "borderline unsupported" anyway and the fix to delete the SUNWprivate label would be very trivial.

Thus, I've forked lwjgl2 at https://github.com/MeFisto94/lwjgl and made it build with Github Actions (I've taken the post Java8 patches from Debian).

My Proposal is to try to merge the existing lwjgl2 jar with the .so created with my fork and try to see if that works on Ubuntu as expected and then switch the engine's lwjgl2 dependency to this repository. Details like if we should provide org.jmonkeyengine.lwjgl2 or something can be discussed later.

Also before recommending to just delete lwjgl2, please consider:

  • Using LWJGL3 for the SDK would be illegal (specs require to run on the Main Thread) and will crash on Mac OS X. This is what happened to jay's plugins/editor and what will happen to all IntelliJ Plugins or whatever are around. Furthermore, this will affect ALL guys writing their own kind-of-launchers as well as using something like launch4j, probably even the JavaFX packaging.

  • Using LWJGL3 on Mac OS X for anything else is problematic (one needs to pass -XStartOnMainThread=true to the JVM, because native threads != the java main thread, otherwise).

  • While lwjgl2 may be old and have some messy code/bugs, lwjgl3 changed a lot of things which make apps work unreliable. Specifically text input on non US layouts is "broken" or bad design.

  • The heavy lifting has already done by me, so it's just about thorough testing and packaging.

MeFisto94 avatar Apr 07 '20 12:04 MeFisto94

@MeFisto94 thank you for the work in this.

Personally I fell like it would be a bad idea to start maintaining our own version of lwjgl2. Our issues with lwjgl3 mostly do not really stem from 3 itself, but more the fact that Apple decided to stop OpenGL support. For the SDK and other platforms, the main thread issue seems to be related solely to Apple.

First, I guess it may be time to just drop Apple and LWJGL2 support, and move related LWJGL2 into the contrib repo, and if people want to build OS X apps, then they can use lwjgl2.

Or we are going to be stuck attempting to maintain lwjgl2, which is a massive undertaking. It is just a matter of time before another deal breaker issue pops up in the library.

It is an unfortunate situation. I guess the last option is to beg the lwjgl team to do 'one last fix' but seeing as nothing has been touched on it in years, I doubt they would.

tlf30 avatar Apr 07 '20 12:04 tlf30

lwjgl3 has other problems but they are really glfw problems, I guess. Key events issues, threading issues, etc. all seem to come back to glfw.

It's these reasons that keep me on lwjgl2, personally. If the biggest problem with lwjgl2 is "not compiled with correct Java", I'm even sort of on board with compiling it ourselves. To me "it's old" is not reason enough on its own to not use something.

pspeed42 avatar Apr 07 '20 15:04 pspeed42

Had this bumped back to my consciousness by @MeFisto94's listed issue above. It's less "Compiled with wrong Java" and more that OpenJDK's releases have broken binary compatibility with every other JDK/JRE out there, including fully passing builds.

I'd be inclined to move testing to AdoptOpenJDK platforms, deprecate use of the binaries from Oracle and the OpenJDK project, and call it good, rather than adding another native build step --That then needs to be supported.

Sailsman63 avatar Jun 14 '21 02:06 Sailsman63

I see a lot of potential benefits to forking LWJGL2 and building/maintaining it ourselves. I've recently taken similar steps with JBullet, stack-alloc, ASM, and j-ogg-all, so it seems I fall into Paul's camp on this issue.

stephengold avatar Jun 14 '21 03:06 stephengold

My concern would be making sure that natives compiled with the Oracle/ JVMs also work with AdoptOpenJDK. Otherwise, you'd have to maintain full separate build stacks depending on which JRE supplier the user is working with.

Sailsman63 avatar Jun 14 '21 17:06 Sailsman63

If it's difficult to build native libraries that work with both JVMs, shouldn't LWJGL v3 have the same issue? (And Minie, of course.)

stephengold avatar Jun 14 '21 18:06 stephengold

That's a good point. One thing that seems to be missing with the lwjgl2 build that @MeFisto94 has linked is that there's no clear way to launch a basic test, and some of the sets classes are erroring out -- just not sure what we're getting.

Sailsman63 avatar Jun 14 '21 20:06 Sailsman63

So you guys are on a wrong track here, it is not really related to the jdk (besides being post java 8), but more about the libc version. Don't quote me on the details, but the bottomline is: lwjgl2 is linked against a versioned symbol in AWT_GetAWT(?) of lib(j?)awt.so, which was called something like sun private and thus was removed with java 9.

Now it all depends on the handling and the related stuff, I think Debian still had the symbol and Ubuntu did not or something? Either way AdoptOpenJDK had a different version of libc when building, i think.

I also think that an old version just ignored the missing symbol or something, then there is the version that fails at the assertion, which is definitely a bug, because it's not a clean handling. I think I thought that I had seen a more recent java version which just refuses to load the library, I think that's what @Sailsman63 can confirm.

Also I made some more progress on building the maven artifacts, there are problems here and there, but coming back to my initial idea of just swapping out the natives, that may be a good idea, the zip should already contain the .so file. It's just a lot more difficult on a gradle build system, which is why we may think about adding that hack directly into jme3-lwjgl or host a public maven release (if we even can, we're not lwjgl)

MeFisto94 avatar Jun 15 '21 21:06 MeFisto94

^ My understanding was that it boiled down to This glibc bug. Situations when the runtime tries to load multiple native libraries in quick succession (Such as lwjgl2, or older versions of jogamp JOGL) can lead to data races.

The JREs that show the issue seem to have statically linked a version of glibc that has the bug., whereas those that do not are linking a different version of glibc. The newer versions of OpenJDK/Oracle builds seem to catch the data race and error out rather than actually link a fixed version of glibc.

Sailsman63 avatar Jun 16 '21 16:06 Sailsman63

I can confirm the lwjgl 2.9.4 release on @MeFisto94 fork does resolve both this issue (Inconsistency detected by ld.so: dl-lookup.c: 111) and https://github.com/jMonkeyEngine/jmonkeyengine/issues/1215 (UnsatisfiedLinkError: libjawt.so: version 'SUNWprivate_1.1' not found) in my tests.

Ali-RS avatar Jan 05 '23 07:01 Ali-RS