wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Cranelift: ISLE mid-end performance regression (-9.20%)

Open bongjunj opened this issue 2 months ago • 0 comments

Hi,

this is a follow-up of https://github.com/bytecodealliance/wasmtime/issues/12106 .

Although we've removed two sorts of performance regressing mid-end ISLE rules, there still remains a significant performance degradation as well as other suspected cases. (There is, of course, a bright side: we have significant performance improvements for many cases!)

Performance Regression:

  • shootout-switch
  • pulldown-cmark

First, here is the backing data of the performance regression:

Benchmark No Opt Main Main Speedup
blake3-scalar 317,727 317,719 0.00%
blake3-simd 313,115 306,232 2.25%
bz2 87,201,400 86,337,330 1.00%
pulldown-cmark 6,580,174 6,905,992 -4.72%
regex 209,743,816 210,183,175 -0.21%
shootout-ackermann 8,498,140 7,764,439 9.45%
shootout-base64 381,721,177 352,724,661 8.22%
shootout-ctype 830,813,398 796,486,698 4.31%
shootout-ed25519 9,583,747,723 9,395,321,203 2.01%
shootout-fib2 3,009,269,670 3,010,509,565 -0.04%
shootout-gimli 5,338,258 5,401,697 -1.17%
shootout-heapsort 2,382,073,831 2,375,914,107 0.26%
shootout-keccak 25,168,386 21,112,482 19.21%
shootout-matrix 538,696,036 544,739,691 -1.11%
shootout-memmove 36,156,621 36,115,998 0.11%
shootout-minicsv 1,481,713,625 1,291,534,227 14.73%
shootout-nestedloop 449 442 1.43%
shootout-random 630,328,205 439,691,474 43.36%
shootout-ratelimit 39,148,817 39,956,714 -2.02%
shootout-seqhash 8,869,585,125 8,639,110,150 2.67%
shootout-sieve 905,404,028 840,777,681 7.69%
shootout-switch 139,525,474 153,663,682 -9.20%
shootout-xblabla20 2,891,404 2,907,369 -0.55%
shootout-xchacha20 4,384,703 4,395,319 -0.24%
spidermonkey 636,104,785 631,998,404 0.65%

Unlike the previous cases, the cause is not obvious.

19245 clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif
19241 clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif

The number of instructions does not increase significantly from no-opt to main. However, the applied optimizations make the program use long-lived value:

--- /data/bongjun/clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:43:58.406738645 +0000
+++ /data/bongjun/clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif	2025-12-08 12:49:01.961085326 +0000

-                                    v8572 = iconst.i32 1066
-                                    v8573 = iconst.i32 0
-@d20b                               v4324 = call fn1(v0, v0, v8572, v8573)  ; v8572 = 1066, v8573 = 0
-                                    v8574 = iadd.i64 v105, v106  ; v106 = 3584
-@d219                               v4333 = load.i32 little heap v8574
-                                    v8575 = iconst.i32 6
-                                    v8576 = icmp uge v4333, v8575  ; v8575 = 6
+                                    v8603 = iconst.i32 1066
+                                    v8604 = iconst.i32 0
+@d20b                               v4324 = call fn1(v0, v0, v8603, v8604)  ; v8603 = 1066, v8604 = 0
+                                    v8605 = iadd.i64 v11, v106  ; v106 = 3584
+@d219                               v4333 = load.i32 little heap v8605
+                                    v8606 = iconst.i32 6
+                                    v8607 = icmp uge v4333, v8606  ; v8606 = 6

See v8574 and v8605 which uses v105 and v11. v11 is defined at the beginning, but v105 is defined later than v11:

                                block0(v0: i64, v1: i64):
@01f0                               v5 = load.i32 notrap aligned table v0+256
@01f6                               v6 = iconst.i32 16
@01f8                               v7 = isub v5, v6  ; v6 = 16
@01fb                               store notrap aligned table v7, v0+256
@0203                               v9 = iconst.i32 0x2710
@0207                               v11 = load.i64 notrap aligned readonly can_move checked v0+56

...

@02d6                               v105 = iadd.i64 v11, v4439

This might increase the register pressure, causing more spills which can degrade the performance.

bongjunj avatar Dec 08 '25 13:12 bongjunj