binaryen Run (basic) StackIR optimizations in all binary writes?

Atm we enable StackIR optimizations in -O3 and -Os and above. This made sense because their benefit is usually fairly small, around 1-2%, and we didn't want to slow down builds just for that. However, maybe it is worth changing that, for two reasons:

Our past worry about slowing down builds is perhaps not relevant today: Most and perhaps all toolchains using Binaryen are not running it in debug builds. Emscripten for example stopped running wasm-opt in debug builds and even in -O1 (it only runs in -O2+), and other toolchains likewise have fast iteration/debug builds that just skip wasm-opt entirely. If wasm-opt is only run when it is meant to optimize, then there is little harm in running StackIR opts.
StackIR opts improve roundtripping in some cases, which is actually the immediate reason that made me think about this. Things like multivalue end up adding more locals and sets/gets in some cases, and StackIR opts can get rid of a bunch of those. Ideally we'd get to a point where roundtripping a file only shrinks it or keeps it the same size (which may require more than StackIR, but StackIR would be an important part of it). That is, better roundtripping is an additional goal here, beyond the 1-2% that StackIR normally helps.
It would be simpler to just always run StackIR (at least the basic, non-costly parts) all the time, rather than the current system where we have passes to generate and optimize it, and there are various corner cases like what happens if you generate it but then modify BinaryenIR, etc. We can avoid that complexity by always running StackIR in binary writing.

StackIR does have some slower optimizations, which could be enabled only when the user requests a higher optimization level, which the binary writer would check.

(context: https://github.com/WebAssembly/binaryen/pull/6390 and another approach I am trying to fix that same problem as that PR may end up adding more roundtrip artifacts in rare cases, so I was wondering about ways to mitigate that.)

Apr 17 '24 22:04 kripken

This makes sense to me. Unconditionally running StackIR could also motivate us to pay more attention to it.

Apr 18 '24 00:04 tlively

One issue I noticed now is that the StackIR binary writing path does not support DWARF and source maps. Making it support DWARF looks trivial since that information is only used as a single bool, but I don't know how much work source maps would be.

edit: this has been fixed in #6564

Apr 29 '24 22:04 kripken

If we want to do this, we'd need to first improve the roundtripping of stacky code, which StackIR opts emit. As noted in https://github.com/emscripten-core/emscripten/pull/22218 it turned out that running StackIR opts early in Emscripten added a few bytes to code size, since we didn't undo the extra code due to stackiness - subsequent loads of the wasm added locals etc., but we weren't doing enough optimizations to remove those.

The particular situation there was that we added several extra locals. If we didn't coalesce them then we'd keep the multiple locals around forever. --coalesce-locals is enough to fix that, but perhaps we can find a way to avoid adding so many locals in the first place (e.g. the binary reading code could reuse temp locals in some cases).

Jul 12 '24 23:07 kripken

#6614 will help a bit here; The IRBuilder generates better IR from stacky code than the current binary parser.

Jul 13 '24 23:07 tlively

binaryen binaryen copied to clipboard

Run (basic) StackIR optimizations in all binary writes?

binaryen
binaryen copied to clipboard