wabt icon indicating copy to clipboard operation
wabt copied to clipboard

wasm-stats: support markov chain style listings

Open SoniEx2 opened this issue 2 years ago • 7 comments

The wasm-stats tool outputs opcode counts for a wasm file. It would be really helpful if it could also build a markov chain from the opcodes in a wasm file, or the probability that a given opcode is followed by any other opcodes. As we now know, inlining isn't actually helpful with wasm, and such markov chains would help with analyzing areas of excessive inlining in wasm modules.

Might implement this ourselves... at some point. Been busy. If anyone else wants to work on it, do let us know so we can avoid duplicate work.

SoniEx2 avatar Oct 08 '23 13:10 SoniEx2

Probably the best way to display these would be to list the opcodes and, for each of those, what opcodes they're followed by. Something along the lines of:

(local.get 1) 150
- (global.put 1) 50%
- (global.put 2) 30%
- ...

Probably better to avoid showing the matrix form of the markov chain. Those can get really big...

SoniEx2 avatar Oct 08 '23 22:10 SoniEx2

Out of curiosity, where does you claim come from that "inlining isn't actually helpful with wasm"?

As far as I know on the producer side in both llvm and binaryen we do a bunch of inlining and/or provide ways to increase of the decease the amount of inlining, and this has historically be valuable.

sbc100 avatar Oct 09 '23 21:10 sbc100

we believe this is the paper that argues inlining is considered harmful https://alan-romano.github.io/When_Function_Inlining_Meets_WebAssembly__Counterintuitive_Impacts_on_Runtime_Performance.pdf

we find inlining particularly harmful when targeting the JVM, because it has such tight restrictions on function size and whatnot. but that's (mostly) unrelated to performance. (except code size can prevent compilation, in particular the JVM JIT never kicks in for functions greater than 8kb in size.)

SoniEx2 avatar Oct 09 '23 21:10 SoniEx2

I see.. I've not seen that paper, thanks for the link.

The notable thing about targeting the JVM is that it itself will do a bunch of inlining, which I'm not sure that web engines do (or at least they maybe didn't historically).

sbc100 avatar Oct 09 '23 21:10 sbc100

It is true that on-stack replacement (OSR) is a serious known issue (see last paragraph in this section of the V8 docs), and so it is definitely possible to find benchmarks that regress by a lot without OSR, as the paper found. But often you find the opposite effect, because inlining is just so useful.

For comparison, the emscripten benchmark suite has cases that become many times slower without inlining. So we really cannot disable inlining in the toolchain.

Note that cases where OSR is needed are rare on the web, since you need to break up execution into short events anyhow (to avoid blocking the responsiveness of the browser). That gives a chance for lower-tiered code to be replaced. OSR may be more of an issue off the web; in those places, if you run code in one big execution then you do need to be aware of OSR, and you can work around it, e.g. by doing nodejs --no-liftoff which disables tiering and goes straight into the fastest tier; other VMs have similar flags.

kripken avatar Oct 09 '23 23:10 kripken

that makes sense, but:

we mean the whole motivation for adding things to wasm-stats (formerly opcodecnt) in the first place is because we wanna make wasm2kotlin work better, by introducing something like a wasm-deopt tool. like maybe this is ultimately useless for everyone else but we feel like deoptimization passes like "demacro" (aka "#defines to functions") could be useful outside of just wasm2kotlin...

we also get to learn a bit about optimization passes (despite this being a deoptimizer) so that's cool.

SoniEx2 avatar Oct 10 '23 00:10 SoniEx2

Also, not to forget that web engines are rather special beasts, and the drawbacks of multiple tiers usually do not apply to non-web use cases.

rossberg avatar Oct 10 '23 06:10 rossberg