ChakraCore icon indicating copy to clipboard operation
ChakraCore copied to clipboard

Pal thread handling crashes on latest Arch linux

Open rhuanjl opened this issue 4 years ago • 22 comments

On Arch linux CC crashes when starting a helper thread with the below stack trace, this crash was confirmed with both 1.11 and master and was likely introduced by changes in Arch linux rather than changes in CC:

#0  0x00007ffff510aef5 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff50f4862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff514cf78 in __libc_message () at /usr/lib/libc.so.6
#3  0x00007ffff514cfaa in __libc_fatal () at /usr/lib/libc.so.6
#4  0x00007ffff4eca70c in __lll_lock_wait () at /usr/lib/libpthread.so.0
#5  0x00007ffff4ec35f0 in pthread_mutex_lock () at /usr/lib/libpthread.so.0
#6  0x00007ffff589847b in JsUtil::BackgroundJobProcessor::Run(JsUtil::ParallelThreadData*) () at /usr/lib/libChakraCore.so
#7  0x00007ffff589826a in JsUtil::BackgroundJobProcessor::StaticThreadProc(void*) () at /usr/lib/libChakraCore.so
#8  0x00007ffff56328ce in CorUnix::CPalThread::ThreadEntry(void*) () at /usr/lib/libChakraCore.so
#9  0x00007ffff4ec1299 in start_thread () at /usr/lib/libpthread.so.0
#10 0x00007ffff51cd153 in clone () at /usr/lib/libc.so.6

(Issue reported by a friend - I don't have a linux setup to verify with myself)

rhuanjl avatar Feb 16 '21 20:02 rhuanjl

What JS code does this reproduce on? Even if it is started to happen after some changes in Arch, we might still be on the hook if something wrong is passed down to the system library.

ppenzin avatar Feb 17 '21 06:02 ppenzin

What JS code does this reproduce on? Even if it is started to happen after some changes in Arch, we might still be on the hook if something wrong is passed down to the system library.

It's not specific to any particular JS - running the test suite it happened with over half the dynopogo tests.

Running a simple console.log("hello") js file it happened intermittently.

rhuanjl avatar Feb 17 '21 07:02 rhuanjl

Arch seems to be available in WSL, I will give it a try.

ppenzin avatar Feb 17 '21 07:02 ppenzin

I'm the friend mentioned in the OP, the one who reported the issue to @rhuanjl. I ran into the issue in Arch on my desktop, laptop, a virtual machine, and just to be absolutely sure that I hadn't overlooked something, I just tried it in a fresh Arch WSL installation and got an almost identical stack trace. It even crashed when running this simple code, though it didn't crash every single time.

for(let i = 0; i < 32; i++) {
	console.log(`#${i}`);
}

Eggbertx avatar Mar 06 '21 05:03 Eggbertx

The crash is when it tries to launch a helper thread, I'm guessing but the trigger is likely the first attempt to run the JIT OR the garbage collector (both of which normally operate off thread).

rhuanjl avatar Mar 09 '21 21:03 rhuanjl

This is reproducible for me on Manjaro, stacktrace looks the same. When building inside of an Ubuntu 20.04 container, everything works fine. I'll post an update if I'll be able to reproduce this on some later version of Ubuntu

nic11 avatar Jan 09 '22 18:01 nic11

@nic11 Use Windows

Pospelove avatar Jan 09 '22 18:01 Pospelove

Fails on Ubuntu 21.10 (with a different stacktrace though). Probably it's caused by libicu, as it appears like the only needed library which has different major version in these three environments:

# Ubuntu 20.04
$ ldd vcpkg_installed/x64-linux/bin/libChakraCore.so 
        linux-vdso.so.1 (0x00007ffcce371000)
        libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f9a94108000)
        libicui18n.so.66 => /lib/x86_64-linux-gnu/libicui18n.so.66 (0x00007f9a93e09000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9a93de6000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9a93de0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9a93c91000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f9a93c76000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9a93a82000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9a94fda000)
        libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f9a91fc1000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9a91ddf000)

# Ubuntu 21.10
ldd vcpkg_installed/x64-linux/bin/libChakraCore.so
        linux-vdso.so.1 (0x00007ffd525dd000)
        libicuuc.so.67 => /lib/x86_64-linux-gnu/libicuuc.so.67 (0x00007fa712a66000)
        libicui18n.so.67 => /lib/x86_64-linux-gnu/libicui18n.so.67 (0x00007fa71275f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa71267b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa712661000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa712439000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa713910000)
        libicudata.so.67 => /lib/x86_64-linux-gnu/libicudata.so.67 (0x00007fa710920000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa710705000)

# Manjaro
$ ldd vcpkg_installed/x64-linux/bin/libChakraCore.so
        linux-vdso.so.1 (0x00007ffd518fb000)
        libicuuc.so.70 => /usr/lib/libicuuc.so.70 (0x00007f5dadefa000)
        libicui18n.so.70 => /usr/lib/libicui18n.so.70 (0x00007f5dadbd4000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f5dadbb3000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f5dadbac000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f5dada68000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f5dada4d000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f5dad87f000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f5daee1e000)
        libicudata.so.70 => /usr/lib/libicudata.so.70 (0x00007f5dabc63000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f5daba4d000)

nic11 avatar Jan 14 '22 04:01 nic11

@nic11 can we fallback to old icu and check

Pospelove avatar Jan 14 '22 12:01 Pospelove

I had a feeling this would eventually happen. To my knowledge, the Arch devs only make changes when absolutely necessary, so if the issue isn't resolved soon, it'll likely start breaking with other distros as well.

Eggbertx avatar Jan 15 '22 23:01 Eggbertx

I'm 99% confident that the failure on Arch was to be do with PAL.

Relevant info

  • PAL is a component taken from dotNetCoreCLR and embedded in the ChakraCore source.
  • PAL is a shim of much of the windows C runtime on top of the linux of macOS C runtimes
  • PAL within ChakraCore is modified quite a bit from the original so updating to latest PAL as a fix isn't an (easy) option
  • as a long term goal I'd like to remove PAL but I honestly don't know if I'm ever going to have enough time to do that.
  • the ARCH crash specifically appeared to be do with the thread manager that PAL provides (chakracore uses Windows style thread handling which PAL shims on top of the relevant linux apis)

ICU An ICU related failure is almost certainly a different issue - and hopefully a much easier fix as we do just link to ICU rather than distributing it with the CC source - I'm not aware of anything in the CC source that locks us to a specific ICU version other than a default link target which we could update.

rhuanjl avatar Jan 22 '22 11:01 rhuanjl

Would it be possible at all to replace PAL with a more natural system without having to rewrite the majority of the codebase for CC as a whole?

Eggbertx avatar Jan 24 '22 06:01 Eggbertx

I've tried building with --embed-icu, but had no luck either. Though it links with an older version. Worth trying to specify the aame one Ubuntu 20.04 ships (66). I think I'll try that sometime later

nic11 avatar Jan 24 '22 09:01 nic11

Would it be possible at all to replace PAL with a more natural system without having to rewrite the majority of the codebase for CC as a whole?

I'd like to investigate this - the big piece to look into is thread handling and what calls exactly are being used to create and manage threads also how spread out across the codebase they are - it shouldn't actually be that invasive a change as there's only a few things that can spin up a thread.

rhuanjl avatar Jan 26 '22 14:01 rhuanjl

I've tried building with --embed-icu, but had no luck either. Though it links with an older version. Worth trying to specify the aame one Ubuntu 20.04 ships (66). I think I'll try that sometime later

Could you use a debug or test build and post a stack trace with symbols in so we can see where it's going wrong?

rhuanjl avatar Jan 26 '22 14:01 rhuanjl

Actually on my main system it's just the same as in the issue description. Or what exactly do you mean?

Actually tbh I didn't do a standalone build, I build ChakraCore as a vcpkg dependency, but the stacktrace matches the description this issue.

And yeah, sorry, but it may take me a bit of time, I'm busy currently. I guess I'll try libicu 66 build, maybe it'll help. Btw if I create an Arch or Ubuntu 21.10 based Docker image or Dockerfile where it can be reproduced, would it be simpler for you to debug?

nic11 avatar Jan 27 '22 17:01 nic11

https://sourceware.org/git/?p=glibc.git;a=commit;h=2e39f65b5ef11647beb4980c4244bac8af192c14 This problem may be caused by this commit.

Wedmer avatar Mar 07 '22 18:03 Wedmer

Also I've found some more info https://www.mail-archive.com/[email protected]/msg2161469.html

Wedmer avatar Mar 07 '22 21:03 Wedmer

Also I've found some more info https://www.mail-archive.com/[email protected]/msg2161469.html

Isn't it about SIGBUS? I thought it's a regular segfault here.

nic11 avatar Mar 08 '22 17:03 nic11

Isn't it about SIGBUS? I thought it's a regular segfault here.

Just read whole thread to get some info why futex error is thrown. This info is needed to understand why mentioned above glibc commit can lead to these terminations.

Wedmer avatar Mar 09 '22 00:03 Wedmer

@Wedmer this looks like a potential culprit, PAL might be trying to lock on unaligned address, it has its own approach to things at times.

ppenzin avatar Jun 02 '22 04:06 ppenzin

I've seen. Part of PAL is truly cross-platform and has implementation for different synchronization objects and APIs, but another part is heavily built around pthread.

Wedmer avatar Jun 02 '22 07:06 Wedmer