rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Auto-detection of musl assumes shell access to call ldd

Open DRoppelt opened this issue 2 years ago • 4 comments

We are using rocksdb through kafka-streams on an alpine container. The version streams pulls is 6.19.3

After last years AdoptOpenJDK's switch to Eclipse Temurin, they have also completely migrated from java on alpine via glibc to musl native (see https://github.com/adoptium/containers/issues/72)

During this switch, we observed that rocksdb needed some libs, first we installed libstdc++ and then also added gcompat. We assumed that rocksdb was one of the usecases where we needed to restore a glibc compatibility layer

# before installing libstdc++
## java.lang.UnsatisfiedLinkError: /tmp/librocksdbjni16704160909693629288.so: Error loading shared library libstdc++.so.6: No such file or directory (needed by /tmp/librocksdbjni16704160909693629288.so)
# when installing libstdc++, the next thing is missing:
## java.lang.UnsatisfiedLinkError: /tmp/librocksdbjni2530667842657717075.so: Error loading shared library ld-linux-x86-64.so.2

We then later encountered segfaults at runtime at nonpredictable times

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff28fd33174, pid=1, tid=47
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.14.1+1 (11.0.14.1+1) (build 11.0.14.1+1)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (11.0.14.1+1, mixed mode, tiered, compressed oops, serial gc, linux-amd64)
# Problematic frame:
# C  [libstdc++.so.6+0xd6174]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /app/core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

Eventuelly we came across this line of code https://github.com/facebook/rocksdb/blob/main/java/src/main/java/org/rocksdb/util/Environment.java#L13 coming from this PR https://github.com/facebook/rocksdb/pull/3143

In that thread it was discussed a bit that actually detecting musl vs glibc was tricky and the actual approach changed a few times.

Here is the issue we are facing: on a container environment, the container might be additionally locked down such as restricting access the sh to prevent anyone remoting into a container that is meant to run only the JVM.

Since about end of last year, the adoption rate of actual native musl + JVM is much higher thanks to https://openjdk.java.net/jeps/386 which was backport even to java 8.

IMO I think that this issue will come up more often relatively soon. The adoptium of native musl for jdk8 was as recent as 08.03. for adoptium (https://github.com/adoptium/containers/pull/167).

Expected behavior

musl detection to work in a restricted container environment

Actual behavior

library that assumes glibc will be used instead, potentionally leading to segfaults at runtime

Steps to reproduce the behavior

➜  ~ docker run -it --entrypoint sh eclipse-temurin:17.0.3_7-jdk-alpine
/ # jshell
May 06, 2022 5:28:46 AM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
|  Welcome to JShell -- Version 17.0.3
|  For an introduction type: /help intro

jshell> new ProcessBuilder("/usr/bin/env", "sh", "-c", "ldd /usr/bin/env | grep -q musl").start()
$1 ==> Process[pid=90, exitValue=0]

jshell>
/ # rm /bin/sh
/ # jshell
|  Welcome to JShell -- Version 17.0.3
|  For an introduction type: /help intro

jshell> new ProcessBuilder("/usr/bin/env", "sh", "-c", "ldd /usr/bin/env | grep -q musl").start()
$1 ==> Process[pid=190, exitValue=127]

jshell>

Proposal

Add an additional check that also checks for the existance of /lib/libc.musl-x86_64.so.1 via new File("/lib/libc.musl-x86_64.so.1").exists() such as

    try {
      if(new File("/lib/libc.musl-x86_64.so.1").exists()) {
        MUSL_LIBC = true;
      }
      else {
        final Process p = new ProcessBuilder("/usr/bin/env", "sh", "-c", "ldd /usr/bin/env | grep -q musl").start();
        MUSL_LIBC = p.waitFor() == 0;
      }

    } catch (final IOException | InterruptedException e) {
      MUSL_LIBC = false;
    }

or

offer a fallback where one can generally circumvent this auto-detection via a defined ENV. I think that this would only be reasonable if the catch() MUSL_LIBC=false would output some log that would hint at a failed detection that can be overriden via an env?

DRoppelt avatar May 06 '22 05:05 DRoppelt

Thanks @DRoppelt for reporting. I see you have a proposal, would you like to contribute a PR?

riversand963 avatar May 10 '22 15:05 riversand963

@riversand963 I would like to, yes 👍

DRoppelt avatar May 10 '22 16:05 DRoppelt

@riversand963 PR submitted

DRoppelt avatar May 10 '22 20:05 DRoppelt

Hi, I think this issue is related to my open issue https://github.com/facebook/rocksdb/issues/10651

I have tested the PR but doesn't fix the Arm Alpine Linux issue.

davidmelia avatar Sep 12 '22 12:09 davidmelia

@davidmelia libstdc++.so.6: cannot open shared object file: No such file or directory hints that the required OS depdency libstdc++ is missing. As alpine is pretty bare-bones, it has to be added.

if the wrong architecture is loaded, you would see Error loading shared library ld-linux-x86-64.so.2 (for instance Linux instead of Alpine)

In my ticket there is no explicit reproduceable build that shows that I add libstdc. A apk add libstdc++ should do the trick in combination with the PR. Or at least moving from libstdc++.so.6: cannot open shared object file: No such file or directory to error loading shared library...

DRoppelt avatar Oct 13 '22 20:10 DRoppelt

@riversand963 , @ajkr there has not been progress regarding a review or discussing the approach

I have been mulling over the changes and they might be a bit too narrow minded to tackle the alpine issue specifically.

The changes introduce a) file based detection over shell result b) add an option for explicit user input for the picked architecture (but specific for alpine only).

I was thinking that we might want to do only b) and that for all archs. So keeping the shell-based detection, but allowing an opt-out from it via allowing a user-defined architecture. Maybe a ROCKSDB_ENVIRONMENT variable to pick any of the supported archs? (pr is specifically checking ROCKSDB_ENVIRONMENT_MUSL:true/false).

Or doing the filebased detection change, but extending ROCKSDB_ENVIRONMENT_MUSL to ROCKSDB_ENVIRONMENT also?

I would do the PR for either, let me know what you think

DRoppelt avatar Oct 13 '22 20:10 DRoppelt

@DRoppelt @adamretter Can we use a system property to indicate musl vs. non-musl when building JARs for a particular libc, e.g., "rocksdbjni-7.6.0-linux64.jar" and "rocksdbjni-7.6.0-linux64-musl.jar"?

ajkr avatar Oct 24 '22 20:10 ajkr

@ajkr Sure - see https://github.com/facebook/rocksdb/pull/10889

adamretter avatar Oct 27 '22 17:10 adamretter

as #10889 is closed as merged https://github.com/facebook/rocksdb/commit/781a387488b1b32b674f555bb3f7a9bdc96fbd42, I am closing this

@adamretter @ajkr I am not familiar with the release process, could you please let me know with which version this is going to be released with? I assume there is no backports to 6.X

DRoppelt avatar Nov 06 '22 10:11 DRoppelt