Segfault with >120 versions on Linux
Initially, I found a segfault after upgrading kotlin, which generated a bit different WASM. Kotlin issue. However, I still would expect error messages instead of segfaults, regardless of how bad or wrong wasm is (actually unoptimised bad.wasm works OK)
I checked a few versions, and the latest working is 120. All later versions, including 124, are crashing.
Repro
- Download the bad.wasm.tar.gz
- Parameters I am using:
binaryen/install/bin/wasm-opt --enable-nontrapping-float-to-int --enable-gc --enable-reference-types --enable-exception-handling --enable-bulk-memory --inline-functions-with-loops --traps-never-happen --fast-math --closed-world -O3 --gufa -O3 --gufa -O3 --gufa bad.wasm -o out.wasm
Env
- Archlinux, with LTS kernel:
6.12.47-1-lts #1 SMP PREEMPT_DYNAMIC
Extra info
I tried building with debug symbols, so I got stacktrace like:
#0 0x0000000000959088 _ZN4wasm21AbstractChildIteratorINS_18ValueChildIteratorEEC2EPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0x559088)
#1 0x0000000000954be9 _ZN4wasm18ValueChildIteratorC2EPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0x554be9)
#2 0x00000000011dc5d4 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc5d4)
#3 0x00000000011e0cb0 _ZZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockEENKUlS4_jE_clES4_j (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0cb0)
#4 0x00000000011e100b _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde100b)
#5 0x00000000011de5bf _ZN4wasm7VisitorINS_16BinaryenIRWriterINS_16StackIRGeneratorEEEvE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xdde5bf)
#6 0x00000000011dc6f7 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc6f7)
#7 0x00000000011dc63e _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc63e)
#8 0x00000000011e0cb0 _ZZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockEENKUlS4_jE_clES4_j (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0cb0)
#9 0x00000000011e0de3 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0de3)
#10 0x00000000011de5bf _ZN4wasm7VisitorINS_16BinaryenIRWriterINS_16StackIRGeneratorEEEvE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xdde5bf)
#11 0x00000000011dc6f7 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc6f7)
#12 0x00000000011dc63e _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc63e)
I can't reproduce this on my linux machine, even in valgrind.
Is it perhaps a stack overflow? (edit: or OOM?)
If not, perhaps reducing it can show something interesting. Another thing to try is a gdb stacktrace in a debug build.
It would be crazy to get OOM on my 64GB machine. Thankfully, gdb shows nicer stacktraces (I am at 124 tagged version):
#0 0x0000000000959088 in wasm::AbstractChildIterator<wasm::ValueChildIterator>::AbstractChildIterator (this=0x0, parent=0x0) at /src/src/ir/iteration.h:88
#1 0x0000000000954be9 in wasm::ValueChildIterator::ValueChildIterator (this=0x7ffff7f6c350, parent=0x2bdf6342798) at /src/src/ir/iteration.h:138
#2 0x00000000011dc5d4 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf6342798) at /src/src/wasm-stack.h:273
#3 0x00000000011e0cb0 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock(wasm::Block*)::{lambda(wasm::Block*, unsigned int)#1}::operator()(wasm::Block*, unsigned int) const (__closure=0x7ffff7f6c440, curr=0x2bdf823f4e0, from=0) at /src/src/wasm-stack.h:300
#4 0x00000000011e100b in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-stack.h:375
#5 0x00000000011de5bf in wasm::Visitor<wasm::BinaryenIRWriter<wasm::StackIRGenerator>, void>::visit (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-delegations.def:18
#6 0x00000000011dc6f7 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-stack.h:288
#7 0x00000000011dc63e in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2be00398098) at /src/src/wasm-stack.h:274
#8 0x00000000011e0cb0 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock(wasm::Block*)::{lambda(wasm::Block*, unsigned int)#1}::operator()(wasm::Block*, unsigned int) const (__closure=0x7ffff7f6c6c0, curr=0x2be003980b0, from=0) at /src/src/wasm-stack.h:300
#9 0x00000000011e0de3 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-stack.h:324
#10 0x00000000011de5bf in wasm::Visitor<wasm::BinaryenIRWriter<wasm::StackIRGenerator>, void>::visit (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-delegations.def:18
#11 0x00000000011dc6f7 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-stack.h:288
#12 0x00000000011dc63e in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf6342810) at /src/src/wasm-stack.h:274
Hmm,
#0 0x0000000000959088 in wasm::AbstractChildIterator<wasm::ValueChildIterator>::AbstractChildIterator (this=0x0, parent=0x0) at /src/src/ir/iteration.h:88
#1 0x0000000000954be9 in wasm::ValueChildIterator::ValueChildIterator (this=0x7ffff7f6c350, parent=0x2bdf6342798) at /src/src/ir/iteration.h:138
this (and parent) is null in frame 0. But frame 1 should be passing those values along...?
Makes no sense that I can see, which makes me suspect a compiler bug. Where did you get that build / how did you built it?
I built another on the host and it works! Only Alpine build fails.
this(andparent) is null in frame 0. But frame 1 should be passing those values along...?
My first thought was that it was a GDB problem, because it was not able to map .h files correctly. It is a reason why I built a host to have the correct paths, and it was working.
Makes no sense that I can see, which makes me suspect a compiler bug. Where did you get that build / how did you built it?
I downloaded builds from releases here. For build I followed the steps from actions: https://github.com/WebAssembly/binaryen/blob/main/.github/workflows/create_release.yml#L147-L153
Thanks, I can confirm this with the alpine release builds.
One difference there is that they build with mimalloc. Building with that locally, I don't see the problem (same as you iiuc).
I also did a bunch of experiments with gcc/clang and with or without asan. Nothing led to the problem.
Overall this suggests the issue is specific to the alpine builds, perhaps the specific compiler version, or the specific libc that is statically linked in.
One frame above looks legit:
It is a mystery why it's crashes here with this == NULL?
Another finding, if I run on single core (no matter which one), it works, but if I use at least two cores, it crashes.
I tried building the 120 version using the same Alpine container and Vualia. It crashes the same. It is not the code to blame, but something new in the Alpine or compiler.
Also, it always crashes on DropId.
Update: removing -DBUILD_MIMALLOC=ON does not help either the 124 or 120 versions.