illumos-joyent icon indicating copy to clipboard operation
illumos-joyent copied to clipboard

RethinkDB unable to start successfully on Alpine Linux in LX or Docker

Open davefinster opened this issue 8 years ago • 3 comments

It appears that there is an issue preventing RethinkDB from successfully running on Alpine (musl libc) based zones when run as standard LX zones or as Docker containers. Containers running RethinkDB within Debian Jessie (as the official RethinkDB image does) or in a LX Debian Jessie zone (with UUID 5ff8fd8a-0ca8-11e6-b8bd-cf9c395fb29d) do not exhibit this issue and Rethink can run successfully.

An example Docker image exhibiting this behaviour can be found at docketbook/rethinkdb-alpine:2.3.2. This image works as expected when run using an official Docker Toolbox installation on my local machine.

Strace Output from both the Alpine and Debian instances can be found at this link. There is also a file called alpine-debug-firstrun.log that contains the strace output for the first run of RethinkDB which crashes as described later. https://gist.github.com/davefinster/3f78b06e60bac0d3883f2657a7b75259

A compiled version of RethinkDB for Alpine (for testing in LX) can be downloaded here: https://dl.dropboxusercontent.com/u/227463/rethinkdb-2.3.2-r0.apk.

I did note that in the Alpine log, there are several ENOSYS errors encountered when performing stat() (among other functions) whereas these same calls result in ENOENT on Debian. I attempted to run the Dtrace unimplemented sys call commands as per https://wiki.smartos.org/display/DOC/LX+Branded+Zones but there were no hits.

I've also captured some core dumps: First Run Crash Dump gcore Parent while running gcore Child while running

Alpine (line 902 start)

<... execve resumed> )            = -1 ENOSYS (Function not implemented)
19837 arch_prctl(ARCH_SET_FS, 0x7ffffee88da8) = -1 ENOSYS (Function not implemented)
19837 set_tid_address(0x7ffffee88de0)   = -1 ENOSYS (Function not implemented)
19837 mprotect(0x7ffffee87000, 4096, PROT_READ) = -1 ENOSYS (Function not implemented)
19837 mprotect(0x7fffff2c1000, 16384, PROT_READ) = -1 ENOSYS (Function not implemented)
19837 getuid()                          = 4294967258
19837 brk(0)                            = -1 ENOSYS (Function not implemented)
19837 brk(0x2000)                       = -1 ENOSYS (Function not implemented)
19837 getpid()                          = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7ffffec48225}, 0x7fffffeff570, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7ffffec48225}, 0x7fffffeff570, 8) = -1 ENOSYS (Function not implemented)
19837 getppid()                         = -1 ENOSYS (Function not implemented)
19837 uname(0x7fffffeff912)             = -1 ENOSYS (Function not implemented)
19837 stat("/data", 0x7fffffeff728)     = -1 ENOSYS (Function not implemented)
19837 stat(".", 0x7fffffeff7b8)         = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGINT, NULL, 0x7fffffeff5d0, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], 0, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGINT, {0x7fffff04780c, ~[RTMIN RT_1 RT_2], SA_RESTORER, 0x7ffffec48225}, NULL, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGQUIT, NULL, 0x7fffffeff5d0, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGQUIT, {SIG_IGN, ~[RTMIN RT_1 RT_2], SA_RESTORER, 0x7ffffec48225}, NULL, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGTERM, NULL, 0x7fffffeff5d0, 8) = -1 ENOSYS (Function not implemented)
19837 rt_sigaction(SIGTERM, {SIG_DFL, ~[RTMIN RT_1 RT_2], SA_RESTORER, 0x7ffffec48225}, NULL, 8) = -1 ENOSYS (Function not implemented)
19837 stat("/usr/local/sbin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 stat("/usr/local/bin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 stat("/usr/sbin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 stat("/usr/bin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 stat("/sbin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 stat("/bin/uname", 0x7fffffeff3c8) = -1 ENOSYS (Function not implemented)
19837 rt_sigprocmask(SIG_BLOCK, ~[], 0x7fffffeff3d0, 8) = -1 ENOSYS (Function not implemented)
19837 fork()                            = -1 ENOSYS (Function not implemented)
19837 rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP RTMIN RT_1 RT_2], 0, 8) = -1 ENOSYS (Function not implemented)
19838 read(0, 0x7ffffec85850, 140737487303632) = -1 ENOSYS (Function not implemented)
19838 gettid()                          = -1 ENOSYS (Function not implemented)
19838 rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP RTMIN RT_1 RT_2], 0, 8) = -1 ENOSYS (Function not implemented)
19838 rt_sigaction(SIGQUIT, {SIG_DFL, ~[RTMIN RT_1 RT_2], SA_RESTORER, 0x7ffffec48225}, NULL, 8) = -1 ENOSYS (Function not implemented)
19838 execve(NULL, [0], [/* 0 vars */]) = -1 ENOSYS (Function not implemented)
19838 arch_prctl(ARCH_SET_FS, 0x7ffffee88da8) = -1 ENOSYS (Function not implemented)
19838 set_tid_address(0x7ffffee88de0)   = -1 ENOSYS (Function not implemented)
19838 mprotect(0x7ffffee87000, 4096, PROT_READ) = -1 ENOSYS (Function not implemented)
19838 mprotect(0x7fffff2c1000, 16384, PROT_READ) = -1 ENOSYS (Function not implemented)

As a reference, the actual uname binary is located at /bin/uname

Debian (line 1717 start)

59763 rt_sigaction(SIGINT, NULL, {SIG_DFL, [], 0}, 8) = 0
59763 rt_sigaction(SIGINT, {0x7fffff211fd0, ~[RTMIN RT_1], SA_RESTORER, 0x7ffffea350e0}, NULL, 8) = 0
59763 rt_sigaction(SIGQUIT, NULL, {SIG_DFL, [], 0}, 8) = 0
59763 rt_sigaction(SIGQUIT, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7ffffea350e0}, NULL, 8) = 0
59763 rt_sigaction(SIGTERM, NULL, {SIG_DFL, [], 0}, 8) = 0
59763 rt_sigaction(SIGTERM, {SIG_DFL, ~[RTMIN RT_1], SA_RESTORER, 0x7ffffea350e0}, NULL, 8) = 0
59763 stat("/usr/local/sbin/uname", 0x7fffffeff7b0) = -1 ENOENT (No such file or directory)
59763 stat("/usr/local/bin/uname", 0x7fffffeff7b0) = -1 ENOENT (No such file or directory)
59763 stat("/usr/sbin/uname", 0x7fffffeff7b0) = -1 ENOENT (No such file or directory)
59763 stat("/usr/bin/uname", 0x7fffffeff7b0) = -1 ENOENT (No such file or directory)
59763 stat("/sbin/uname", 0x7fffffeff7b0) = -1 ENOENT (No such file or directory)
59763 stat("/bin/uname", {st_mode=S_IFREG|0755, st_size=31240, ...}) = 0
59763 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fffff0109d0) = 59764

Another side issue is that when RethinkDB first launches on Alpine, it crashes producing:

/data # rethinkdb
Recursively removing directory /data/rethinkdb_data/tmp
Initializing directory /data/rethinkdb_data
Running rethinkdb 2.3.2 (GCC 5.3.0)...
Running on Linux 3.13.0 x86_64
Loading data from directory /data/rethinkdb_data
Version: rethinkdb 2.3.2 (GCC 5.3.0)
error: Error in src/arch/io/disk.cc at line 620:
error: Guarantee failed: [abs_res != nullptr]  (errno 2 - No such file or directory) Failed to determine absolute path for '/data/rethinkdb_data/metadata'
error: Backtrace:
error: Fri May 20 04:11:13 2016
error: Exiting.
Trace/breakpoint trap (core dumped)

This error is only produced if the metadata file does not exist. It appears to make enough forward progress to create the file in the first run with avoids the error.

I have replicated this issue in JPC and the compute node that this was tested against is running platform image 20160428T170015Z.

davefinster avatar May 20 '16 04:05 davefinster

Thanks, I filed OS-5426 for this.

jjelinek avatar May 20 '16 11:05 jjelinek

Sorry it took so long to get to this issue, I was fixing the ptrace problems you reported here first. I tried running the RethinkDB that is available at the link above, but it seems to work fine for me.

# rethinkdb
Recursively removing directory /home/jerry/rethinkdb_data/tmp
Initializing directory /home/jerry/rethinkdb_data
Running rethinkdb 2.3.2 (GCC 5.3.0)...
Running on Linux 4.1.20 x86_64
Loading data from directory /home/jerry/rethinkdb_data

This is running on the latest platform build and I am not experiencing any core dump. Would it be possible for you to try running again on the latest platform to see if you're still hitting the problem? If it still fails, can you provide any more details about how I could try to reproduce this?

Thanks, Jerry

jjelinek avatar Jul 18 '16 15:07 jjelinek

@davefinster Is there any indication that the behavior you observed is still present on current PIs?

pfmooney avatar Mar 15 '17 19:03 pfmooney