wabt
wabt copied to clipboard
wat2wasm segfaults on .wat file with many nested if statements
Running wat2wasm
on a rather big file with many nested if-statements segfaults.
Version: wabt 1.0.32-1 (debian bookworm) System: Debian (with enough RAM + swap)
Steps to reproduce:
wasm2wat CertiCoq.Benchmarks.tests.color.wasm > color.wat // color.wat is ~20GB
wat2wasm color.wat -o color.wam // Segmention fault
wasm-tools parse color.wat -o color.wasm // works
segfault:
(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585 if (tokens_.empty()) {
(gdb)
It depends... on the size of the .wat, do you get a more concise output from binaryen's wasm-dis
or wasm-tools print
? If so... we can dive down to compare.
In terms of memory consumption of wat2wasm, wasm-tools parse
is probably going to be better (and also better than binaryen's wasm-as
) since I think it tries not to assemble an in-memory IR of the whole module.
Thanks for the prompt response, wasm-tools parse
worked (with more swap, takes quite some time).
But wat2wasm
still segfaults when run on the 20GB .wat file, I edited the issue accordingly.
FYI: both wasm-tools parse
and wat2wasm
require around 20GB of RAM.
Cool, what's the segfault? (E.g. if you make a debug build and run it in gdb, where does it segfault?)
What is the reason for wanting to support such large files? Done't Wasm runtimes have internal limits that make running such files impossible? Perhaps we should instead just impose a limit on the size of any file we process? (see https://webassembly.github.io/spec/core/appendix/implementation.html#binary-format).
Here is the segfault, hope it helps:
(gdb) run
Starting program: /home/wolfgang/ProjectsUni/Compilerstep/wabt/out/clang/Debug/wat2wasm color.wat -o color.wasm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00005555555f3db3 in wabt::WastParser::GetToken (this=0x7fffffffd800)
at /home/wolfgang/ProjectsUni/Compilerstep/wabt/src/wast-parser.cc:585
585 if (tokens_.empty()) {
(gdb)
@sbc100 We extended the CertiCoq compiler with a Wasm backend. One of the benchmarks of CertiCoq runs a graph coloring algorithm, the generated binary includes the representation of a graph in a functional language, compiled to Wasm. Didn't have issues running the other (a little smaller) benchmarks, haven't tried this one yet.
And in case you're wandering, the roundtrip with wasm2wat
and wat2wasm
is not strictly necessary, we'll get rid of that soon.
What wasm engine are are you able to run 20GB program on (or even 1GB program for that matter)? I would have assumed that such a program would be exceed the internal limits of any engine.
It's just the color.wat
file that's so big (for some reason), the color.wasm
file is just ~2MB, I ran the color.wasm
file in both Node.js and wasmtime, both went fine.
(Node.js takes > 1 min to instantiate the module, but no crash :) )
I think I understand now, why the color.wat
file was so big, it has a quite large chain of nested if-statements, like so:
...
some_check
if
return
else
...
some_check
if
return
else
... ~20,000x this nesting
end
end
Presumably, this adds quite a lot of whitespace. (gzip reduces the 16GB wat file to ~25MB)
we now generate the semantically equivalent programs, where the else branch is moved to after the if-statement, like so:
some_check
if
return
end
...
some_check
if
return
end
...
This change reduced the color.wasm
from 1.8 MB to 1.7 MB, but the color.wat
from 16GB to 14MB
Also, we generate code in SSA form, so one function has ~20k locals before we run wasm-opt --coalesce-locals
:)
some_check
if
return
end
You should be able to shorten that if return end
even more to just br_if i
, where i
is the label of the function body – a br out of a function body is equivalent to return
.