binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Segfault with >120 versions on Linux

Open neworld opened this issue 3 months ago • 8 comments

Initially, I found a segfault after upgrading kotlin, which generated a bit different WASM. Kotlin issue. However, I still would expect error messages instead of segfaults, regardless of how bad or wrong wasm is (actually unoptimised bad.wasm works OK)

I checked a few versions, and the latest working is 120. All later versions, including 124, are crashing.

Repro

  1. Download the bad.wasm.tar.gz
  2. Parameters I am using:
binaryen/install/bin/wasm-opt --enable-nontrapping-float-to-int --enable-gc --enable-reference-types --enable-exception-handling --enable-bulk-memory --inline-functions-with-loops --traps-never-happen --fast-math --closed-world -O3 --gufa -O3 --gufa -O3 --gufa bad.wasm -o out.wasm

Env

  • Archlinux, with LTS kernel: 6.12.47-1-lts #1 SMP PREEMPT_DYNAMIC

Extra info

I tried building with debug symbols, so I got stacktrace like:

                #0  0x0000000000959088 _ZN4wasm21AbstractChildIteratorINS_18ValueChildIteratorEEC2EPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0x559088)
                #1  0x0000000000954be9 _ZN4wasm18ValueChildIteratorC2EPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0x554be9)
                #2  0x00000000011dc5d4 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc5d4)
                #3  0x00000000011e0cb0 _ZZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockEENKUlS4_jE_clES4_j (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0cb0)
                #4  0x00000000011e100b _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde100b)
                #5  0x00000000011de5bf _ZN4wasm7VisitorINS_16BinaryenIRWriterINS_16StackIRGeneratorEEEvE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xdde5bf)
                #6  0x00000000011dc6f7 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc6f7)
                #7  0x00000000011dc63e _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc63e)
                #8  0x00000000011e0cb0 _ZZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockEENKUlS4_jE_clES4_j (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0cb0)
                #9  0x00000000011e0de3 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE10visitBlockEPNS_5BlockE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xde0de3)
                #10 0x00000000011de5bf _ZN4wasm7VisitorINS_16BinaryenIRWriterINS_16StackIRGeneratorEEEvE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xdde5bf)
                #11 0x00000000011dc6f7 _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc6f7)
                #12 0x00000000011dc63e _ZN4wasm16BinaryenIRWriterINS_16StackIRGeneratorEE5visitEPNS_10ExpressionE (/home/neworld/tmp/kt-bug-report/binaryen/install/bin/wasm-opt + 0xddc63e)

neworld avatar Sep 22 '25 18:09 neworld

I can't reproduce this on my linux machine, even in valgrind.

Is it perhaps a stack overflow? (edit: or OOM?)

If not, perhaps reducing it can show something interesting. Another thing to try is a gdb stacktrace in a debug build.

kripken avatar Sep 22 '25 19:09 kripken

It would be crazy to get OOM on my 64GB machine. Thankfully, gdb shows nicer stacktraces (I am at 124 tagged version):

#0  0x0000000000959088 in wasm::AbstractChildIterator<wasm::ValueChildIterator>::AbstractChildIterator (this=0x0, parent=0x0) at /src/src/ir/iteration.h:88
#1  0x0000000000954be9 in wasm::ValueChildIterator::ValueChildIterator (this=0x7ffff7f6c350, parent=0x2bdf6342798) at /src/src/ir/iteration.h:138
#2  0x00000000011dc5d4 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf6342798) at /src/src/wasm-stack.h:273
#3  0x00000000011e0cb0 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock(wasm::Block*)::{lambda(wasm::Block*, unsigned int)#1}::operator()(wasm::Block*, unsigned int) const (__closure=0x7ffff7f6c440, curr=0x2bdf823f4e0, from=0) at /src/src/wasm-stack.h:300
#4  0x00000000011e100b in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-stack.h:375
#5  0x00000000011de5bf in wasm::Visitor<wasm::BinaryenIRWriter<wasm::StackIRGenerator>, void>::visit (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-delegations.def:18
#6  0x00000000011dc6f7 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf823f4e0) at /src/src/wasm-stack.h:288
#7  0x00000000011dc63e in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2be00398098) at /src/src/wasm-stack.h:274
#8  0x00000000011e0cb0 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock(wasm::Block*)::{lambda(wasm::Block*, unsigned int)#1}::operator()(wasm::Block*, unsigned int) const (__closure=0x7ffff7f6c6c0, curr=0x2be003980b0, from=0) at /src/src/wasm-stack.h:300
#9  0x00000000011e0de3 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visitBlock (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-stack.h:324
#10 0x00000000011de5bf in wasm::Visitor<wasm::BinaryenIRWriter<wasm::StackIRGenerator>, void>::visit (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-delegations.def:18
#11 0x00000000011dc6f7 in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2be003980b0) at /src/src/wasm-stack.h:288
#12 0x00000000011dc63e in wasm::BinaryenIRWriter<wasm::StackIRGenerator>::visit (this=0x7ffff7f8bad0, curr=0x2bdf6342810) at /src/src/wasm-stack.h:274

neworld avatar Sep 22 '25 20:09 neworld

Hmm,

#0  0x0000000000959088 in wasm::AbstractChildIterator<wasm::ValueChildIterator>::AbstractChildIterator (this=0x0, parent=0x0) at /src/src/ir/iteration.h:88
#1  0x0000000000954be9 in wasm::ValueChildIterator::ValueChildIterator (this=0x7ffff7f6c350, parent=0x2bdf6342798) at /src/src/ir/iteration.h:138

this (and parent) is null in frame 0. But frame 1 should be passing those values along...?

Makes no sense that I can see, which makes me suspect a compiler bug. Where did you get that build / how did you built it?

kripken avatar Sep 22 '25 20:09 kripken

I built another on the host and it works! Only Alpine build fails.

neworld avatar Sep 22 '25 20:09 neworld

this (and parent) is null in frame 0. But frame 1 should be passing those values along...?

My first thought was that it was a GDB problem, because it was not able to map .h files correctly. It is a reason why I built a host to have the correct paths, and it was working.

Makes no sense that I can see, which makes me suspect a compiler bug. Where did you get that build / how did you built it?

I downloaded builds from releases here. For build I followed the steps from actions: https://github.com/WebAssembly/binaryen/blob/main/.github/workflows/create_release.yml#L147-L153

neworld avatar Sep 22 '25 20:09 neworld

Thanks, I can confirm this with the alpine release builds.

One difference there is that they build with mimalloc. Building with that locally, I don't see the problem (same as you iiuc).

I also did a bunch of experiments with gcc/clang and with or without asan. Nothing led to the problem.

Overall this suggests the issue is specific to the alpine builds, perhaps the specific compiler version, or the specific libc that is statically linked in.

kripken avatar Sep 22 '25 23:09 kripken

One frame above looks legit:

Image

It is a mystery why it's crashes here with this == NULL?

Another finding, if I run on single core (no matter which one), it works, but if I use at least two cores, it crashes.

neworld avatar Sep 23 '25 17:09 neworld

I tried building the 120 version using the same Alpine container and Vualia. It crashes the same. It is not the code to blame, but something new in the Alpine or compiler.

Also, it always crashes on DropId.

Update: removing -DBUILD_MIMALLOC=ON does not help either the 124 or 120 versions.

neworld avatar Sep 23 '25 17:09 neworld