serenity
serenity copied to clipboard
StackInfo issues on Alpine (musl-based)
continuing from https://github.com/SerenityOS/ladybird/issues/153#issue-1509918847:
it looks like the stack size returned by
StackInfo#size_free()
is invalid, causing constant[InternalError] Call stack size limit exceeded
on even the smallest scripts. i tried to debug this, but sadly didn't get very far; also haven't tested if the same issue happens on gentoo or void with musl
To confirm, this is when running ladybird in a musl-libc based desktop environment?
@ADKaster Correct, I've tried running it on multiple machines and all of them had the same issue.
I tried running the tests myself on 379e4a2432e3db557c4ffb47a57a08df40f5662f (HEAD of master at the moment of posting) and they fail with the following:
serenity/Userland/Utilities/ntpquery.cpp:214:13: error: comparison of integers of different signs: 'unsigned long' and 'long' [-Werror,-Wsign-compare]
VERIFY(!CMSG_NXTHDR(&msg, cmsg));
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/socket.h:358:44: note: expanded from macro 'CMSG_NXTHDR'
__CMSG_LEN(cmsg) + sizeof(struct cmsghdr) >= __MHDR_END(mhdr) - (unsigned char *)(cmsg) \
^
/usr/include/assert.h:8:28: note: expanded from macro 'assert'
#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))
^
1 error generated.
that error is due to some internal stuff in the (musl) headers- it'll always fail with -Wsign-compare for the cmsg socket stuff. you'll have to get rid of werror like always to run those, but the actual warning doesn't apply there
(for reference https://www.openwall.com/lists/musl/2016/10/11/1 . but you already knew to nuke Werror :) )
i remember trying to debug this some time ago (months), and now, and it's not merely a stack size issue- raising the stacksize via the PT_GNU_STACK elf header (which musl understands) or using the pthread_attr_setstack api does not resolve it. the issue is in the actual calculation done in stackinfo (and if you simply comment it all out and make it say there is always space, the javascript pages work (as a quick test only, of course).). i couldn't figure out /why/ it fails at calculating the remaining stack size on musl libc specifically, and sadly lost any notes i had at the time.
Likely relevant: https://github.com/rust-lang/rust/blob/2259028a70d6e1a44ad2cfd81955b577a43e8ef6/library/std/src/sys/pal/unix/stack_overflow.rs#L344-L349
The implementation for musl pthread_getattr_np
is here: https://github.com/kraj/musl/blob/2c124e13bd7941fe0b885eecdc5de6f09aacf06a/src/thread/pthread_getattr_np.c#L15-L21 (highlighted part relevant for the main thread)
glibc is vastly more complex and involves reading /proc/self/maps
: https://github.com/bminor/glibc/blob/d49cd6a1913da9744b9a0ffbefb3f7958322382e/nptl/pthread_getattr_np.c#L76-L165
That's quite unfortunate that we can't calculate "how much stack is left" on musl the same way we do pretty much every other C library (except Wasm32 on emscripten).
Looking at your rust link it looks like the answer is "🤷♀️ lmao hope we don't overflow"? Or is there something other language runtimes that we can ~~copy~~ draw inspiration from?
I also looked at Python which is known to be very portable and handles stack overflow - they elaborately hardcode a recursion limit :facepalm:
I'm unsure if Rust deals with this safely on musl in some other way.
(reason why I'm suddenly here: https://donotsta.re/notice/AhnJDfkoKr4E3Fg4oK)
Wait I've got it. On musl, we should change LibMain to just call serenity_main in a secondary thread and then block the main thread waiting for it to exit! Genius.
not sure if applicable here, but WebKit does some mildly cursed stuff for calculating the stack size on the main thread: https://github.com/WebKit/WebKit/blob/b64c3ed8059502609a5dba0c79c6d84773362c4a/Source/WTF/wtf/StackBounds.cpp#L139
Well... pulling from getrlimit(RLIMIT_STACK) and comparing that to some magic "origin" value doesn't seem like the worst idea. The commit message responsible for that part of wtf mentions that they do something similar on Darwin.
https://github.com/WebKit/WebKit/commit/062a7b1f299ece128242b9ce9e31024f9b104bdc
A patch to add a little extra spice to our ifdef soup in StackInfo.cpp along the lines of
#endif // big ifdef chain
#if defined(AK_OS_LINUX) && !defined(AK_OS_ANDROID) && !defined(AK_LIBC_GLIBC)
// Note: musl libc always gives the initial size of the main thread's stack
// < massage m_base and m_size based on RLIMIT_STACK if this thread == main thread>
#endif
m_top = m_base + m_size;
}
would work for me, if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).
I believe we have at least one test that relies on hitting the stack limits to work. The self-proxy one at least, that one broke when Andreas made ThrowCompletionOr small enough to get things optimized by gcc-13 :thousandyakstare:
I added https://github.com/SerenityOS/serenity/blob/master/Userland/Libraries/LibJS/Tests/runtime-error-call-stack-size.js in #3995 :)
if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).
might take a while, there's still some random build failures ( e.g. #21709 was closed with no successor :pensive: )
edit: ...and segfaults in RequestServer
; brb, rebuilding with proper debug symbols
re: request server, try this: 9d3edc3
(#24273)
funnily enough, i stashed the stack size fix and forgot to reapply it before building, but it.. looks like it works without it(?) i'm still getting some funny errors though, like:
212496.294 WebContent(21449): Unhandled JavaScript exception: [TypeError] undefined is not a function
212496.294 WebContent(21449): at <unknown>
at <unknown>
at parseQuery
at parseQuery
at <unknown>
at <unknown>
nevermind, it might simply be better than it was before, but searching anything in duckduckgo finally makes it trip with Call stack size limit exceeded
on just 60-ish calls, no matter the stack size
i tried doing something similar to what WebKit does, but it seems to now go the opposite way and assume the stack is bigger than it actually is:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
352 auto data = *reinterpret_cast<FlatPtr*>(stack_address);
(gdb) bt
#0 JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
#1 0x00007f82356ea17d in JS::Heap::gather_roots (this=0x7f8212665868, roots=...)
at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:292
#2 0x00007f82356eaad6 in JS::Heap::collect_garbage
(this=0x7f8212665868, collection_type=collection_type@entry=JS::Heap::CollectionType::CollectGarbage, print_report=print_report@entry=false)
at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:282
#3 0x00007f8236c71848 in Web::Page::load_html (this=this@entry=0x7f8213a1e040, html=...)
at /home/patrycja/Downloads/serenity/Userland/Libraries/LibWeb/Page/Page.cpp:69
#4 0x000055c848516d67 in WebContent::ConnectionFromClient::load_html
(this=<optimized out>, page_id=<optimized out>, html=...)
at /home/patrycja/Downloads/serenity/Userland/Services/WebContent/ConnectionFromClient.cpp:156