binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0)

Open xuruiyang2002 opened this issue 8 months ago • 8 comments

Currently, wasm-opt cannot optimize the code as expected: (unsigned)x >= 0 ==> i32(0).

   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

It should have been optimized after merge-blocks, as the store could have been hoisted. However, the load and store cannot be reordered, preventing wasm-opt from doing so.

This can be addressed in optimizeInstructions by using fallthrough to hoist the constant zero, enabling further optimizations.

Fixes: #7556

xuruiyang2002 avatar Apr 26 '25 13:04 xuruiyang2002

The issue involved (unsigned(x) >= 0 => i32(1)) in this PR is a mirror of the previously fixed issue #7455 (for unsigned(x) < 0 => i32(0)), so I think it makes sense to fix this for a better optimization.

Could you please review if you have a moment? Thank you. @kripken

xuruiyang2002 avatar May 27 '25 09:05 xuruiyang2002

Though, separately, in discussion with @tlively and @osa1 the idea came up to perhaps handle this family of problems in a general way. We do have the Flatten pass for this, which has 2 current downsides:

  1. Flatten does not support recent features like exceptions and GC.
  2. Flatten adds a lot of overhead since it forces many things to locals that do not need to be, at least not for the reason this PR is fixing. That is, this PR has
   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

merge-blocks can't move the store out, since it can't move it past the load due to effects. That forces us to look at the fallthrough. But, perhaps a "flatten-lite" could do something more careful, and apply ChildLocalizer. ChildLocalizer will only move children to locals when they actually interact, so it would help in this case.

That is, I am suggesting that we add a new pass which basically runs ChildLocalizer on all expressions (with more than one child). That would "partially flatten" out the code, just enough for OptimizeInstructions to handle the case in this PR, and related cases. And we would do this ChildLocalizer work fairly early in the pipeline so that normal optimizations remove the extra locals later, after we try to benefit from them.

Thoughts?

kripken avatar Jun 30 '25 17:06 kripken

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

For my experience on this family of bugs, wasm-opt can handle most cases with effects in expression well. Now what I think the most problematic is the optimization for the if's condition (optimizeBoolean). Because: (1) It indeed cannot deal with that family well (2) RemoveUnusedBrs can transform block-br structure to if structure. (See issue #7661)

So maybe we can just request for the help of ChildLocalizer in some key points? Like in the optimization for the if's condition.

xuruiyang2002 avatar Jul 01 '25 06:07 xuruiyang2002

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

Perhaps, we would need to measure it. But adding another function pass that works in linear time is generally low in cost (all function passes run on a function in sequence, keeping the function in cache, which makes it very fast, unlike global passes that scan the entire wasm - adding a single global pass is usually costly).

(Also, I wonder if adding such a pass could allow us to remove ssa-nomerge that we run in -O3 atm, as there is some overlap there... but that's just a vague thought.)

So maybe we can just request for the help of ChildLocalizer in some key points?

I think there may be many such key points, though. This PR is an example, and many other similar rules could use this, in order to not need the manual fallthrough fix.

kripken avatar Jul 01 '25 17:07 kripken

I'll find time next week (holiday weekend starts soon) to experiment with that ChildLocalizer pass. If you're interested to do it though @xuruiyang2002 , feel free to (maybe just comment here if so, so we don't do duplicate work).

kripken avatar Jul 02 '25 18:07 kripken

Okay, looking forward to your experiments.

xuruiyang2002 avatar Jul 03 '25 14:07 xuruiyang2002

Ok, here is a pass that runs ChildLocalizer: https://github.com/WebAssembly/binaryen/pull/7708

It does fix this testcase here, and seems like it might be worth landing, but otoh there is more investigation needed there.

kripken avatar Jul 10 '25 18:07 kripken

Yes, I've seen that it also fixes #7659 --- the family of issues could be fixed. In doing this, we don't bother to add patch one by one which is like a "cat-and-mouse" game.

xuruiyang2002 avatar Jul 14 '25 07:07 xuruiyang2002