IncludeOS
IncludeOS copied to clipboard
Early boot memory corruption sometimes causes chain crashes
The best repro case was found with https://github.com/includeos/IncludeOS/pull/2251, preserved until fixed in https://github.com/alfreb/IncludeOS/tree/memory-ghost-repro . On that branch, starting at commit e81fb7c7da96b8cae8b43d406b6d868b7d09b66e reproduce with
nix-shell --argstr unikernel ./test/net/integration/tcp/ --run "./test.py"
( Requires https://github.com/includeos/vmrunner )
Backtrace was fetched from gdb after building musl with debug symbols, and seeing the same issue:
#0 0x0000000000329bc2 in a_crash ()
#1 0x000000000032895e in enframe ()
#2 0x0000000000329840 in alloc_group ()
#3 0x0000000000328853 in alloc_slot ()
#4 0x00000000003297df in alloc_group ()
#5 0x0000000000328853 in alloc_slot ()
#6 0x00000000003297df in alloc_group ()
#7 0x0000000000328853 in alloc_slot ()
#8 0x00000000003285eb in __libc_malloc_impl ()
#9 0x00000000003267a5 in malloc ()
#10 0x000000000023f36b in strdup ()
#11 0x0000000000246f1d in x86::init_libc (magic=<optimized out>, addr=<optimized out>) at /build/source/src/platform/x86_pc/init_libc.cpp:107
#12 0x000000000024769a in long_mode ()
#13 0x0000000000000000 in ?? ()
The call to strdup in init_libc causes a crash in libc during malloc. Our heap should be ready at that time, since this is after init_heap.
Possible culprit:
enframeasserts: https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/meta.h?h=v1.2.4#n205- assert calls abort https://git.musl-libc.org/cgit/musl/tree/src/exit/assert.c , although after fprintf. This fprintf must have been lost in that case (possibly because a system calls to validate file descriptors failed) since there's no output.
- abort calls
a_crashhttps://git.musl-libc.org/cgit/musl/tree/src/exit/abort.c?h=v1.2.5#n27, after some system calls.
alloc_groupcalls enframe: https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c#n267alloc_groupentry: https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/malloc.c#n174
Note that I think this bug is also present on master, possibly the main reason for master not booting at the moment.
Things I've tried
- Remove the calls to strdup. This causes another chain crash a bit later, this time without halting, so in that case it's not libc emitting the crash.
Some additional references:
- Call to
strdupfrominit_libc: https://github.com/includeos/IncludeOS/blob/v0.16.0-release/src/platform/x86_pc/init_libc.cpp#L106 strdupimplementation: https://github.com/includeos/IncludeOS/blob/v0.16.0-release/src/crt/string.c#L23
This may be resolved with #2273