Avoid hosting allocations (especially array allocations) out of conditional paths
We have for example this Dart code
const _codeUnitsCacheSize = 512;
final _codeUnitsCache = WasmArray<WasmI16>(_codeUnitsCacheSize);
JSStringImpl _jsStringFromAsciiBytes(U8List source, int start, int end) {
final length = end - start;
final array =
length <= _codeUnitsCacheSize
? _codeUnitsCache
: WasmArray<WasmI16>(length);
for (int j = start, i = 0; i < length; ++i, ++j) {
array.write(i, source.getUnchecked(j));
}
return JSStringImpl(
jsStringFromCharCodeArray(array, 0.toWasmI32(), length.toWasmI32()),
);
}
Notice that we have a fast path: For small lengths we want to avoid allocating the wasm array and use a cache instead.
Compiled with dart2wasm and optimized with binaryen we get this:
(func $_jsStringFromAsciiBytes (;452;) (param $var0 (ref $U8List)) (param $var1 i64) (param $var2 i64) (result (ref $JSValue_66))
(local $var3 i64)
(local $var4 (ref $Array<i16>))
global.get $_codeUnitsCache
local.get $var2
local.get $var1
i64.sub
local.tee $var3
i32.wrap_i64
array.new_default $Array<i16>
local.get $var3
i64.const 512
i64.le_s
select (ref $Array<i16>)
local.set $var4
i64.const 0
local.set $var2
loop $label0
local.get $var2
local.get $var3
i64.lt_s
if
local.get $var4
local.get $var2
i32.wrap_i64
local.get $var0
struct.get $U8List $field3
local.get $var1
i32.wrap_i64
array.get_u $Array<WasmI8>
array.set $Array<i16>
local.get $var2
i64.const 1
i64.add
local.set $var2
local.get $var1
i64.const 1
i64.add
local.set $var1
br $label0
end
end $label0
local.get $var4
i32.const 0
local.get $var3
i32.wrap_i64
call $wasm:js-string.fromCharCodeArray (import)
call $JSStringImpl
)
Notice that it hoisted the array allocation (the slow path) out of the conditional and unconditionally allocates and then uses a select (ref $Array<i16>) to select between the two.
This is very problematic
See repro:
binaryen_arrayalloc_issue.tar.gz
We compile via
% tar xvzf binaryen_arrayalloc_issue.tar.gz
output.wasm
output.wat
output.unopt.wasm
output.unopt.wat
% wasm-opt --all-features --closed-world --traps-never-happen --type-unfinalizing -Os --type-ssa --gufa -Os \
--type-merging -Os --type-finalizing --minimize-rec-groups -g -o output.wasm output.unopt.wasm
Is there a way to run the code? We can adjust the cost of allocations to make the optimizer avoid them more often, but I'd prefer to do that with measurements.