datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Plugin can fail to initialize native library and hide the root cause

Open andygrove opened this issue 1 year ago • 3 comments

Describe the bug

I tried running Comet in k8s, and got an error initializing NativeBase, but no root cause was given.

│ 24/10/07 22:42:11 WARN CometSparkSessionExtensions: Comet extension is disabled because of error when loading native lib. Falling back to Spark                                                                                                   │
│ java.lang.NoClassDefFoundError: Could not initialize class org.apache.comet.NativeBase  

I added a try/catch around the static initialization block and found the root cause:

java.lang.UnsatisfiedLinkError: /tmp/libcomet-216577115507522405.so: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /tmp/libcomet-216577115507522405.so)                                                            │
│     at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)                                                                                                                                                                         │
│     at java.base/java.lang.ClassLoader$NativeLibrary.load(Unknown Source)                                                                                                                                                                         │
│     at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(Unknown Source)                                                                                                                                                                  │
│     at java.base/java.lang.ClassLoader.loadLibrary0(Unknown Source)                                                                                                                                                                               │
│     at java.base/java.lang.ClassLoader.loadLibrary(Unknown Source)                                                                                                                                                                                │
│     at java.base/java.lang.Runtime.load0(Unknown Source)                                                                                                                                                                                          │
│     at java.base/java.lang.System.load(Unknown Source)                                                                                                                                                                                            │
│     at org.apache.comet.NativeBase.bundleLoadLibrary(NativeBase.java:126)                                                                                                                                                                         │
│     at org.apache.comet.NativeBase.load(NativeBase.java:92)                                                                                                                                                                                       │
│     at org.apache.comet.NativeBase.<clinit>(NativeBase.java:54)                      

Steps to reproduce

No response

Expected behavior

I would like Comet to show me the root cause

Additional context

No response

andygrove avatar Oct 07 '24 22:10 andygrove

This error occurs because the comet native library was built on a system with glibc v 2.35 and the system running the code has an older version of glibc.

glibc is not statically linked into the native library so when libcomet is loaded by the running process the link loader will try to find the appropriate glibc required by libcomet. Since the correct version is not found the call to System.loadLibrary (or System.load) fails with an UnsatisfiedLinkError.

On *nix, you can check the glibc version on your system by running ldd --version

In this case the message

java.lang.UnsatisfiedLinkError: /tmp/libcomet-216577115507522405.so: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /tmp/libcomet-216577115507522405.so)                                                            │

is as close to the root cause as possible (and is about as user friendly as a unix error message gets. :) )

parthchandra avatar Oct 07 '24 22:10 parthchandra

is as close to the root cause as possible (and is about as user friendly as a unix error message gets. :) )

That error is fine. The issue was that Comet was hiding this error, so I had to add debug logging to discover that this was happening. This issue is about improving the error reporting when the plugin fails to load the native library.

andygrove avatar Oct 07 '24 23:10 andygrove

Yes I understood that :) I added the additional context because the UnsatisfiedLinkError message is also not very friendly and it might help some user understand why the library could not be loaded.

parthchandra avatar Oct 07 '24 23:10 parthchandra

Here is another related issue that I am currently running into:

│ Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.comet.package$

andygrove avatar Oct 10 '24 21:10 andygrove