wasmi
wasmi copied to clipboard
Optimize shadow stack instruction sequences
Closes https://github.com/paritytech/wasmi/issues/920.
TODOs
- [x] Fuse
i32.add_immwithglobal.set 0 - [x] Fuse
global.get 0withi32.add_imm - [x] Fuse fused
global.get 0+i32.add_immwithi32.local_teeandglobal.set 0 - [x] Support fusion of
i32.addwith a function local constantrhsregister.
BENCHMARKS
| NATIVE | WASMTIME | ||||||
|---|---|---|---|---|---|---|---|
| BENCHMARK | MASTER | PR | DIFF | MASTER | PR | DIFF | WASMTIME OVERHEAD |
| execute/ br_table | 1.60ms | 1.57ms | :green_circle: -1.72% | 1.39ms | 1.37ms | :green_circle: -1.39% | :green_circle: -13% |
| execute/ call/host/1 | 52.17µs | 51.76µs | :white_circle: -0.79% | 63.41µs | 58.18µs | :green_circle: -8.25% | :green_circle: 12% |
| execute/ call/rec | 172.77µs | 172.93µs | :white_circle: 0.09% | 290.60µs | 290.50µs | :white_circle: -0.03% | :yellow_circle: 68% |
| execute/ count_until | 5.80ms | 5.38ms | :green_circle: -7.26% | 7.23ms | 7.26ms | :white_circle: 0.37% | :green_circle: 35% |
| execute/ divrem | 6.26ms | 6.30ms | :white_circle: 0.79% | 6.32ms | 6.27ms | :white_circle: -0.77% | :green_circle: -0% |
| execute/ factorial/iter | 259.72µs | 264.76µs | :red_circle: 1.94% | 285.90µs | 276.14µs | :green_circle: -3.42% | :green_circle: 4% |
| execute/ factorial/rec | 667.24µs | 673.43µs | :white_circle: 0.93% | 1.10ms | 1.06ms | :green_circle: -4.05% | :yellow_circle: 57% |
| execute/ fibonacci/iter | 1.33ms | 1.32ms | :white_circle: -0.93% | 1.13ms | 1.15ms | :red_circle: 1.66% | :green_circle: -13% |
| execute/ fibonacci/rec | 5.85ms | 5.82ms | :white_circle: -0.58% | 10.71ms | 11.05ms | :red_circle: 3.20% | :yellow_circle: 90% |
| execute/ fibonacci/tail | 1.26ms | 1.31ms | :red_circle: 4.28% | 3.68ms | 3.74ms | :red_circle: 1.46% | :red_circle: 185% |
| execute/ fuse | 7.06ms | 7.06ms | :white_circle: -0.11% | 11.16ms | 11.14ms | :white_circle: -0.18% | :yellow_circle: 58% |
| execute/ global/bump | 1.32ms | 1.16ms | :green_circle: -12.25% | 1.45ms | 1.34ms | :green_circle: -8.17% | :green_circle: 15% |
| execute/ global/get_const | 479.54µs | 485.95µs | :red_circle: 1.34% | 727.49µs | 728.30µs | :white_circle: 0.11% | :green_circle: 50% |
| execute/ is_even/rec | 1.09ms | 1.10ms | :white_circle: 0.44% | 1.82ms | 1.81ms | :white_circle: -0.48% | :yellow_circle: 65% |
| execute/ memory/fill_bytes | 1.10ms | 1.02ms | :green_circle: -7.33% | 1.21ms | 1.17ms | :green_circle: -2.70% | :green_circle: 15% |
| execute/ memory/sum_bytes | 1.03ms | 1.08ms | :red_circle: 4.59% | 1.38ms | 1.23ms | :green_circle: -11.30% | :green_circle: 14% |
| execute/ memory/vec_add | 3.03ms | 2.97ms | :white_circle: -2.15% | 3.22ms | 3.20ms | :white_circle: -0.65% | :green_circle: 8% |
| execute/ recursive_scan | 187.00µs | 193.54µs | :red_circle: 3.50% | 309.36µs | 307.88µs | :white_circle: -0.48% | :yellow_circle: 59% |
| execute/ recursive_trap | 16.49µs | 15.71µs | :green_circle: -4.69% | 28.36µs | 27.33µs | :green_circle: -3.61% | :yellow_circle: 74% |
| execute/ regex_redux | 586.72µs | 583.60µs | :white_circle: -0.53% | 1.03ms | 1.02ms | :white_circle: -1.08% | :yellow_circle: 74% |
| execute/ rev_complement | 438.39µs | 434.49µs | :white_circle: -0.89% | 624.06µs | 614.45µs | :green_circle: -1.54% | :green_circle: 41% |
| execute/ tiny_keccak | 350.00µs | 349.88µs | :white_circle: -0.03% | 318.43µs | 345.90µs | :red_circle: 8.63% | :green_circle: -1% |
| execute/ trunc_f2i | 630.23µs | 629.80µs | :white_circle: -0.07% | 931.89µs | 920.08µs | :green_circle: -1.27% | :green_circle: 46% |
| instantiate/ wasm_kernel | 55.57µs | 53.33µs | :green_circle: -4.04% | 56.38µs | 57.55µs | :white_circle: 2.08% | :green_circle: 8% |
| overhead/ call/typed/0 | 1.42ms | 1.25ms | :green_circle: -11.88% | 851.60µs | 847.19µs | :white_circle: -0.52% | :green_circle: -32% |
| overhead/ call/typed/16 | 1.75ms | 1.64ms | :green_circle: -6.34% | 2.08ms | 2.22ms | :red_circle: 7.06% | :green_circle: 36% |
| overhead/ call/untyped/0 | 1.99ms | 1.58ms | :green_circle: -20.54% | 1.10ms | 1.18ms | :red_circle: 7.17% | :green_circle: -25% |
| overhead/ call/untyped/16 | 2.79ms | 2.48ms | :green_circle: -11.35% | 3.82ms | 4.08ms | :red_circle: 6.70% | :yellow_circle: 65% |
| translate/ bz2/checked/eager/default | 1.37ms | 1.36ms | :white_circle: -0.81% | 2.49ms | 2.44ms | :green_circle: -2.22% | :yellow_circle: 80% |
| translate/ bz2/checked/eager/fuel | 1.47ms | 1.46ms | :white_circle: -0.75% | 2.71ms | 2.66ms | :green_circle: -1.68% | :yellow_circle: 82% |
| translate/ bz2/checked/lazy-translation/default | 552.05µs | 540.99µs | :green_circle: -2.00% | 972.03µs | 947.35µs | :green_circle: -2.54% | :yellow_circle: 75% |
| translate/ bz2/checked/lazy/default | 37.46µs | 36.62µs | :green_circle: -2.25% | 45.99µs | 44.87µs | :green_circle: -2.44% | :green_circle: 23% |
| translate/ bz2/unchecked/eager/default | 1.11ms | 1.09ms | :green_circle: -1.80% | 2.02ms | 1.87ms | :green_circle: -7.57% | :yellow_circle: 71% |
| translate/ erc1155/checked/eager/default | 284.46µs | 279.60µs | :green_circle: -1.71% | 488.00µs | 475.75µs | :green_circle: -2.51% | :yellow_circle: 70% |
| translate/ erc1155/checked/eager/fuel | 305.93µs | 301.06µs | :green_circle: -1.59% | 520.78µs | 508.22µs | :green_circle: -2.41% | :yellow_circle: 69% |
| translate/ erc1155/checked/lazy-translation/default | 128.09µs | 127.60µs | :white_circle: -0.38% | 214.14µs | 209.47µs | :green_circle: -2.18% | :yellow_circle: 64% |
| translate/ erc1155/checked/lazy/default | 26.11µs | 26.13µs | :white_circle: 0.07% | 32.86µs | 32.05µs | :green_circle: -2.46% | :green_circle: 23% |
| translate/ erc1155/unchecked/eager/default | 234.03µs | 229.30µs | :green_circle: -2.02% | 394.64µs | 370.04µs | :green_circle: -6.23% | :yellow_circle: 61% |
| translate/ erc20/checked/eager/default | 137.57µs | 136.17µs | :white_circle: -1.01% | 232.18µs | 227.70µs | :green_circle: -1.93% | :yellow_circle: 67% |
| translate/ erc20/checked/eager/fuel | 146.05µs | 144.20µs | :green_circle: -1.26% | 246.06µs | 241.30µs | :green_circle: -1.93% | :yellow_circle: 67% |
| translate/ erc20/checked/lazy-translation/default | 65.68µs | 65.44µs | :white_circle: -0.38% | 107.46µs | 106.58µs | :white_circle: -0.82% | :yellow_circle: 63% |
| translate/ erc20/checked/lazy/default | 18.97µs | 19.20µs | :white_circle: 1.22% | 24.63µs | 24.50µs | :white_circle: -0.53% | :green_circle: 28% |
| translate/ erc20/unchecked/eager/default | 113.86µs | 110.95µs | :green_circle: -2.56% | 186.33µs | 176.79µs | :green_circle: -5.12% | :yellow_circle: 59% |
| translate/ erc721/checked/eager/default | 196.09µs | 194.31µs | :white_circle: -0.90% | 338.14µs | 330.47µs | :green_circle: -2.27% | :yellow_circle: 70% |
| translate/ erc721/checked/eager/fuel | 206.39µs | 203.84µs | :green_circle: -1.23% | 356.45µs | 347.77µs | :green_circle: -2.44% | :yellow_circle: 71% |
| translate/ erc721/checked/lazy-translation/default | 91.80µs | 91.97µs | :white_circle: 0.19% | 153.70µs | 150.67µs | :green_circle: -1.97% | :yellow_circle: 64% |
| translate/ erc721/checked/lazy/default | 23.51µs | 23.20µs | :white_circle: -1.29% | 28.76µs | 28.92µs | :white_circle: 0.55% | :green_circle: 25% |
| translate/ erc721/unchecked/eager/default | 159.05µs | 156.15µs | :green_circle: -1.83% | 265.73µs | 251.35µs | :green_circle: -5.41% | :yellow_circle: 61% |
| translate/ pulldown_cmark/checked/eager/default | 3.67ms | 3.64ms | :white_circle: -0.85% | 6.46ms | 6.33ms | :green_circle: -2.00% | :yellow_circle: 74% |
| translate/ pulldown_cmark/checked/eager/fuel | 3.94ms | 3.92ms | :white_circle: -0.58% | 6.92ms | 6.78ms | :green_circle: -2.09% | :yellow_circle: 73% |
| translate/ pulldown_cmark/checked/lazy-translation/default | 1.56ms | 1.54ms | :white_circle: -1.28% | 2.59ms | 2.53ms | :green_circle: -2.48% | :yellow_circle: 64% |
| translate/ pulldown_cmark/checked/lazy/default | 248.38µs | 245.99µs | :white_circle: -0.96% | 250.74µs | 251.09µs | :white_circle: 0.14% | :green_circle: 2% |
| translate/ pulldown_cmark/unchecked/eager/default | 3.08ms | 3.04ms | :white_circle: -1.01% | 5.23ms | 4.90ms | :green_circle: -6.21% | :yellow_circle: 61% |
| translate/ spidermonkey/checked/eager/default | 78.05ms | 78.17ms | :white_circle: 0.16% | 137.69ms | 135.22ms | :green_circle: -1.80% | :yellow_circle: 73% |
| translate/ spidermonkey/checked/eager/fuel | 84.42ms | 84.65ms | :white_circle: 0.28% | 148.75ms | 145.93ms | :green_circle: -1.90% | :yellow_circle: 72% |
| translate/ spidermonkey/checked/lazy-translation/default | 32.91ms | 33.03ms | :white_circle: 0.37% | 56.88ms | 55.81ms | :green_circle: -1.88% | :yellow_circle: 69% |
| translate/ spidermonkey/checked/lazy/default | 3.24ms | 3.28ms | :white_circle: 1.31% | 4.19ms | 4.15ms | :white_circle: -0.86% | :green_circle: 26% |
| translate/ spidermonkey/unchecked/eager/default | 64.64ms | 64.05ms | :white_circle: -0.91% | 112.38ms | 104.96ms | :green_circle: -6.61% | :yellow_circle: 64% |
| translate/ wasm_kernel/checked/eager/default | 5.11ms | 5.19ms | :red_circle: 1.63% | 9.01ms | 8.84ms | :green_circle: -1.93% | :yellow_circle: 70% |
| translate/ wasm_kernel/checked/eager/fuel | 5.29ms | 5.34ms | :white_circle: 0.98% | 9.56ms | 9.25ms | :green_circle: -3.22% | :yellow_circle: 73% |
| translate/ wasm_kernel/checked/lazy-translation/default | 2.44ms | 2.43ms | :white_circle: -0.31% | 4.01ms | 3.95ms | :green_circle: -1.47% | :yellow_circle: 63% |
| translate/ wasm_kernel/checked/lazy/default | 424.77µs | 426.09µs | :white_circle: 0.31% | 480.08µs | 483.37µs | :white_circle: 0.68% | :green_circle: 13% |
| translate/ wasm_kernel/unchecked/eager/default | 4.17ms | 4.18ms | :white_circle: 0.38% | 7.13ms | 6.75ms | :green_circle: -5.27% | :yellow_circle: 61% |
Codecov Report
Attention: Patch coverage is 77.72727% with 49 lines in your changes missing coverage. Please review.
Project coverage is 80.49%. Comparing base (
978a58f) to head (27264e3). Report is 102 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #928 +/- ##
==========================================
+ Coverage 80.48% 80.49% +0.01%
==========================================
Files 270 270
Lines 25079 25273 +194
==========================================
+ Hits 20184 20343 +159
- Misses 4895 4930 +35
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Comment on why this has not yet been merged despite passing all CI tests and working: it introduces a lot of complexity into the translation pipelined compared to the gains we see and can verify. The runtime gains are mostly single digits and restricted to sets of benchmarks that actually make use of the shadow stack. The global_bump benchmark is heavily affected with roughly 30% performance improvement but it is also very artificial. On the flip side the improvements to memory consumption also exist but are also just single digits for practical Wasm binaries.
So all in all the question is whether the gains are worth the added complexities.
The miri CI job fails because of this issue: https://github.com/rust-lang/miri/issues/3404
We might want to block this until we have multiple look-back translation feature in the Wasmi bytecode translator. This allows to get rid of intermediate optimized instructions.
I am closing this now since too much has happened in the code base so that a full rewrite would make more sense. However, I am very uncertain that this optimization is a clear improvement to Wasmi as a whole.