binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Avoid hosting allocations (especially array allocations) out of conditional paths

Open mkustermann opened this issue 10 months ago • 2 comments

We have for example this Dart code

const _codeUnitsCacheSize = 512;
final _codeUnitsCache = WasmArray<WasmI16>(_codeUnitsCacheSize);

JSStringImpl _jsStringFromAsciiBytes(U8List source, int start, int end) {
  final length = end - start;
  final array =
      length <= _codeUnitsCacheSize
          ? _codeUnitsCache
          : WasmArray<WasmI16>(length);
  for (int j = start, i = 0; i < length; ++i, ++j) {
    array.write(i, source.getUnchecked(j));
  }
  return JSStringImpl(
    jsStringFromCharCodeArray(array, 0.toWasmI32(), length.toWasmI32()),
  );
}

Notice that we have a fast path: For small lengths we want to avoid allocating the wasm array and use a cache instead.

Compiled with dart2wasm and optimized with binaryen we get this:

  (func $_jsStringFromAsciiBytes (;452;) (param $var0 (ref $U8List)) (param $var1 i64) (param $var2 i64) (result (ref $JSValue_66))
    (local $var3 i64)
    (local $var4 (ref $Array<i16>))
    global.get $_codeUnitsCache
    local.get $var2
    local.get $var1
    i64.sub
    local.tee $var3
    i32.wrap_i64
    array.new_default $Array<i16>
    local.get $var3
    i64.const 512
    i64.le_s
    select (ref $Array<i16>)
    local.set $var4
    i64.const 0
    local.set $var2
    loop $label0
      local.get $var2
      local.get $var3
      i64.lt_s
      if
        local.get $var4
        local.get $var2
        i32.wrap_i64
        local.get $var0
        struct.get $U8List $field3
        local.get $var1
        i32.wrap_i64
        array.get_u $Array<WasmI8>
        array.set $Array<i16>
        local.get $var2
        i64.const 1
        i64.add
        local.set $var2
        local.get $var1
        i64.const 1
        i64.add
        local.set $var1
        br $label0
      end
    end $label0
    local.get $var4
    i32.const 0
    local.get $var3
    i32.wrap_i64
    call $wasm:js-string.fromCharCodeArray (import)
    call $JSStringImpl
  )

Notice that it hoisted the array allocation (the slow path) out of the conditional and unconditionally allocates and then uses a select (ref $Array<i16>) to select between the two.

This is very problematic

mkustermann avatar Feb 20 '25 10:02 mkustermann

See repro:

binaryen_arrayalloc_issue.tar.gz

We compile via

% tar xvzf binaryen_arrayalloc_issue.tar.gz 
output.wasm
output.wat
output.unopt.wasm
output.unopt.wat

% wasm-opt --all-features --closed-world --traps-never-happen --type-unfinalizing -Os --type-ssa --gufa -Os \
           --type-merging -Os --type-finalizing --minimize-rec-groups -g  -o output.wasm  output.unopt.wasm 

mkustermann avatar Feb 20 '25 10:02 mkustermann

Is there a way to run the code? We can adjust the cost of allocations to make the optimizer avoid them more often, but I'd prefer to do that with measurements.

kripken avatar Feb 20 '25 17:02 kripken