wasmi icon indicating copy to clipboard operation
wasmi copied to clipboard

Optimize shadow stack instruction sequences

Open Robbepop opened this issue 1 year ago • 5 comments

Closes https://github.com/paritytech/wasmi/issues/920.

TODOs

  • [x] Fuse i32.add_imm with global.set 0
  • [x] Fuse global.get 0 with i32.add_imm
  • [x] Fuse fused global.get 0 + i32.add_imm with i32.local_tee and global.set 0
  • [x] Support fusion of i32.add with a function local constant rhs register.

Robbepop avatar Feb 06 '24 18:02 Robbepop

BENCHMARKS

NATIVEWASMTIME
BENCHMARKMASTERPRDIFFMASTERPRDIFFWASMTIME OVERHEAD
execute/
br_table
1.60ms 1.57ms :green_circle: -1.72% 1.39ms 1.37ms :green_circle: -1.39% :green_circle: -13%
execute/
call/host/1
52.17µs 51.76µs :white_circle: -0.79% 63.41µs 58.18µs :green_circle: -8.25% :green_circle: 12%
execute/
call/rec
172.77µs 172.93µs :white_circle: 0.09% 290.60µs 290.50µs :white_circle: -0.03% :yellow_circle: 68%
execute/
count_until
5.80ms 5.38ms :green_circle: -7.26% 7.23ms 7.26ms :white_circle: 0.37% :green_circle: 35%
execute/
divrem
6.26ms 6.30ms :white_circle: 0.79% 6.32ms 6.27ms :white_circle: -0.77% :green_circle: -0%
execute/
factorial/iter
259.72µs 264.76µs :red_circle: 1.94% 285.90µs 276.14µs :green_circle: -3.42% :green_circle: 4%
execute/
factorial/rec
667.24µs 673.43µs :white_circle: 0.93% 1.10ms 1.06ms :green_circle: -4.05% :yellow_circle: 57%
execute/
fibonacci/iter
1.33ms 1.32ms :white_circle: -0.93% 1.13ms 1.15ms :red_circle: 1.66% :green_circle: -13%
execute/
fibonacci/rec
5.85ms 5.82ms :white_circle: -0.58% 10.71ms 11.05ms :red_circle: 3.20% :yellow_circle: 90%
execute/
fibonacci/tail
1.26ms 1.31ms :red_circle: 4.28% 3.68ms 3.74ms :red_circle: 1.46% :red_circle: 185%
execute/
fuse
7.06ms 7.06ms :white_circle: -0.11% 11.16ms 11.14ms :white_circle: -0.18% :yellow_circle: 58%
execute/
global/bump
1.32ms 1.16ms :green_circle: -12.25% 1.45ms 1.34ms :green_circle: -8.17% :green_circle: 15%
execute/
global/get_const
479.54µs 485.95µs :red_circle: 1.34% 727.49µs 728.30µs :white_circle: 0.11% :green_circle: 50%
execute/
is_even/rec
1.09ms 1.10ms :white_circle: 0.44% 1.82ms 1.81ms :white_circle: -0.48% :yellow_circle: 65%
execute/
memory/fill_bytes
1.10ms 1.02ms :green_circle: -7.33% 1.21ms 1.17ms :green_circle: -2.70% :green_circle: 15%
execute/
memory/sum_bytes
1.03ms 1.08ms :red_circle: 4.59% 1.38ms 1.23ms :green_circle: -11.30% :green_circle: 14%
execute/
memory/vec_add
3.03ms 2.97ms :white_circle: -2.15% 3.22ms 3.20ms :white_circle: -0.65% :green_circle: 8%
execute/
recursive_scan
187.00µs 193.54µs :red_circle: 3.50% 309.36µs 307.88µs :white_circle: -0.48% :yellow_circle: 59%
execute/
recursive_trap
16.49µs 15.71µs :green_circle: -4.69% 28.36µs 27.33µs :green_circle: -3.61% :yellow_circle: 74%
execute/
regex_redux
586.72µs 583.60µs :white_circle: -0.53% 1.03ms 1.02ms :white_circle: -1.08% :yellow_circle: 74%
execute/
rev_complement
438.39µs 434.49µs :white_circle: -0.89% 624.06µs 614.45µs :green_circle: -1.54% :green_circle: 41%
execute/
tiny_keccak
350.00µs 349.88µs :white_circle: -0.03% 318.43µs 345.90µs :red_circle: 8.63% :green_circle: -1%
execute/
trunc_f2i
630.23µs 629.80µs :white_circle: -0.07% 931.89µs 920.08µs :green_circle: -1.27% :green_circle: 46%
instantiate/
wasm_kernel
55.57µs 53.33µs :green_circle: -4.04% 56.38µs 57.55µs :white_circle: 2.08% :green_circle: 8%
overhead/
call/typed/0
1.42ms 1.25ms :green_circle: -11.88% 851.60µs 847.19µs :white_circle: -0.52% :green_circle: -32%
overhead/
call/typed/16
1.75ms 1.64ms :green_circle: -6.34% 2.08ms 2.22ms :red_circle: 7.06% :green_circle: 36%
overhead/
call/untyped/0
1.99ms 1.58ms :green_circle: -20.54% 1.10ms 1.18ms :red_circle: 7.17% :green_circle: -25%
overhead/
call/untyped/16
2.79ms 2.48ms :green_circle: -11.35% 3.82ms 4.08ms :red_circle: 6.70% :yellow_circle: 65%
translate/
bz2/checked/eager/default
1.37ms 1.36ms :white_circle: -0.81% 2.49ms 2.44ms :green_circle: -2.22% :yellow_circle: 80%
translate/
bz2/checked/eager/fuel
1.47ms 1.46ms :white_circle: -0.75% 2.71ms 2.66ms :green_circle: -1.68% :yellow_circle: 82%
translate/
bz2/checked/lazy-translation/default
552.05µs 540.99µs :green_circle: -2.00% 972.03µs 947.35µs :green_circle: -2.54% :yellow_circle: 75%
translate/
bz2/checked/lazy/default
37.46µs 36.62µs :green_circle: -2.25% 45.99µs 44.87µs :green_circle: -2.44% :green_circle: 23%
translate/
bz2/unchecked/eager/default
1.11ms 1.09ms :green_circle: -1.80% 2.02ms 1.87ms :green_circle: -7.57% :yellow_circle: 71%
translate/
erc1155/checked/eager/default
284.46µs 279.60µs :green_circle: -1.71% 488.00µs 475.75µs :green_circle: -2.51% :yellow_circle: 70%
translate/
erc1155/checked/eager/fuel
305.93µs 301.06µs :green_circle: -1.59% 520.78µs 508.22µs :green_circle: -2.41% :yellow_circle: 69%
translate/
erc1155/checked/lazy-translation/default
128.09µs 127.60µs :white_circle: -0.38% 214.14µs 209.47µs :green_circle: -2.18% :yellow_circle: 64%
translate/
erc1155/checked/lazy/default
26.11µs 26.13µs :white_circle: 0.07% 32.86µs 32.05µs :green_circle: -2.46% :green_circle: 23%
translate/
erc1155/unchecked/eager/default
234.03µs 229.30µs :green_circle: -2.02% 394.64µs 370.04µs :green_circle: -6.23% :yellow_circle: 61%
translate/
erc20/checked/eager/default
137.57µs 136.17µs :white_circle: -1.01% 232.18µs 227.70µs :green_circle: -1.93% :yellow_circle: 67%
translate/
erc20/checked/eager/fuel
146.05µs 144.20µs :green_circle: -1.26% 246.06µs 241.30µs :green_circle: -1.93% :yellow_circle: 67%
translate/
erc20/checked/lazy-translation/default
65.68µs 65.44µs :white_circle: -0.38% 107.46µs 106.58µs :white_circle: -0.82% :yellow_circle: 63%
translate/
erc20/checked/lazy/default
18.97µs 19.20µs :white_circle: 1.22% 24.63µs 24.50µs :white_circle: -0.53% :green_circle: 28%
translate/
erc20/unchecked/eager/default
113.86µs 110.95µs :green_circle: -2.56% 186.33µs 176.79µs :green_circle: -5.12% :yellow_circle: 59%
translate/
erc721/checked/eager/default
196.09µs 194.31µs :white_circle: -0.90% 338.14µs 330.47µs :green_circle: -2.27% :yellow_circle: 70%
translate/
erc721/checked/eager/fuel
206.39µs 203.84µs :green_circle: -1.23% 356.45µs 347.77µs :green_circle: -2.44% :yellow_circle: 71%
translate/
erc721/checked/lazy-translation/default
91.80µs 91.97µs :white_circle: 0.19% 153.70µs 150.67µs :green_circle: -1.97% :yellow_circle: 64%
translate/
erc721/checked/lazy/default
23.51µs 23.20µs :white_circle: -1.29% 28.76µs 28.92µs :white_circle: 0.55% :green_circle: 25%
translate/
erc721/unchecked/eager/default
159.05µs 156.15µs :green_circle: -1.83% 265.73µs 251.35µs :green_circle: -5.41% :yellow_circle: 61%
translate/
pulldown_cmark/checked/eager/default
3.67ms 3.64ms :white_circle: -0.85% 6.46ms 6.33ms :green_circle: -2.00% :yellow_circle: 74%
translate/
pulldown_cmark/checked/eager/fuel
3.94ms 3.92ms :white_circle: -0.58% 6.92ms 6.78ms :green_circle: -2.09% :yellow_circle: 73%
translate/
pulldown_cmark/checked/lazy-translation/default
1.56ms 1.54ms :white_circle: -1.28% 2.59ms 2.53ms :green_circle: -2.48% :yellow_circle: 64%
translate/
pulldown_cmark/checked/lazy/default
248.38µs 245.99µs :white_circle: -0.96% 250.74µs 251.09µs :white_circle: 0.14% :green_circle: 2%
translate/
pulldown_cmark/unchecked/eager/default
3.08ms 3.04ms :white_circle: -1.01% 5.23ms 4.90ms :green_circle: -6.21% :yellow_circle: 61%
translate/
spidermonkey/checked/eager/default
78.05ms 78.17ms :white_circle: 0.16% 137.69ms 135.22ms :green_circle: -1.80% :yellow_circle: 73%
translate/
spidermonkey/checked/eager/fuel
84.42ms 84.65ms :white_circle: 0.28% 148.75ms 145.93ms :green_circle: -1.90% :yellow_circle: 72%
translate/
spidermonkey/checked/lazy-translation/default
32.91ms 33.03ms :white_circle: 0.37% 56.88ms 55.81ms :green_circle: -1.88% :yellow_circle: 69%
translate/
spidermonkey/checked/lazy/default
3.24ms 3.28ms :white_circle: 1.31% 4.19ms 4.15ms :white_circle: -0.86% :green_circle: 26%
translate/
spidermonkey/unchecked/eager/default
64.64ms 64.05ms :white_circle: -0.91% 112.38ms 104.96ms :green_circle: -6.61% :yellow_circle: 64%
translate/
wasm_kernel/checked/eager/default
5.11ms 5.19ms :red_circle: 1.63% 9.01ms 8.84ms :green_circle: -1.93% :yellow_circle: 70%
translate/
wasm_kernel/checked/eager/fuel
5.29ms 5.34ms :white_circle: 0.98% 9.56ms 9.25ms :green_circle: -3.22% :yellow_circle: 73%
translate/
wasm_kernel/checked/lazy-translation/default
2.44ms 2.43ms :white_circle: -0.31% 4.01ms 3.95ms :green_circle: -1.47% :yellow_circle: 63%
translate/
wasm_kernel/checked/lazy/default
424.77µs 426.09µs :white_circle: 0.31% 480.08µs 483.37µs :white_circle: 0.68% :green_circle: 13%
translate/
wasm_kernel/unchecked/eager/default
4.17ms 4.18ms :white_circle: 0.38% 7.13ms 6.75ms :green_circle: -5.27% :yellow_circle: 61%

Link to pipeline

paritytech-cicd-pr avatar Feb 06 '24 19:02 paritytech-cicd-pr

Codecov Report

Attention: Patch coverage is 77.72727% with 49 lines in your changes missing coverage. Please review.

Project coverage is 80.49%. Comparing base (978a58f) to head (27264e3). Report is 102 commits behind head on main.

Files with missing lines Patch % Lines
...rates/wasmi/src/engine/translator/instr_encoder.rs 72.41% 16 Missing :warning:
crates/wasmi/src/engine/executor/instrs/global.rs 27.77% 13 Missing :warning:
crates/wasmi/src/engine/translator/visit.rs 55.00% 9 Missing :warning:
crates/wasmi/src/engine/executor/instrs.rs 33.33% 4 Missing :warning:
...ates/wasmi/src/engine/translator/visit_register.rs 0.00% 3 Missing :warning:
crates/wasmi/src/engine/translator/mod.rs 90.00% 2 Missing :warning:
...rates/wasmi/src/engine/translator/relink_result.rs 33.33% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #928      +/-   ##
==========================================
+ Coverage   80.48%   80.49%   +0.01%     
==========================================
  Files         270      270              
  Lines       25079    25273     +194     
==========================================
+ Hits        20184    20343     +159     
- Misses       4895     4930      +35     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 06 '24 20:02 codecov-commenter

Comment on why this has not yet been merged despite passing all CI tests and working: it introduces a lot of complexity into the translation pipelined compared to the gains we see and can verify. The runtime gains are mostly single digits and restricted to sets of benchmarks that actually make use of the shadow stack. The global_bump benchmark is heavily affected with roughly 30% performance improvement but it is also very artificial. On the flip side the improvements to memory consumption also exist but are also just single digits for practical Wasm binaries.

So all in all the question is whether the gains are worth the added complexities.

Robbepop avatar Mar 14 '24 14:03 Robbepop

The miri CI job fails because of this issue: https://github.com/rust-lang/miri/issues/3404

Robbepop avatar Mar 28 '24 16:03 Robbepop

We might want to block this until we have multiple look-back translation feature in the Wasmi bytecode translator. This allows to get rid of intermediate optimized instructions.

Robbepop avatar Jun 25 '24 12:06 Robbepop

I am closing this now since too much has happened in the code base so that a full rewrite would make more sense. However, I am very uncertain that this optimization is a clear improvement to Wasmi as a whole.

Robbepop avatar Oct 04 '24 20:10 Robbepop