binaryen Use SmallSet in EffectsAnalyzer

Hi @kripken & @sbc100 !

As promised in #4884, this is split off into it's own PR. For it to have any effect, though, it still depends on avoiding the default constructor allocation, implemented in that other PR.

Benchmarks for the SmallSet template argument following in short.

Best, Jonathan

Aug 09 '22 14:08 Squareys

Here's the benchmarks (edited for readability), script, input wasm and the executables can be found here

$ ./run-wasm-opt.sh
wasm-opt-baseline:
real    0m23.973s

wasm-opt-no-set-allocation:
real    0m23.209s

wasm-opt-smallset-10:
real    0m18.224s

wasm-opt-smallset-4:
real    0m17.072s

wasm-opt-smallset-2:
real    0m17.087s

wasm-opt-smallset-1:
real    0m17.584s

wasm-opt-smallset-0:
real    0m19.192s
---------------

Here are a couple of culprits:

Where with EffectsAnalyzer, avoiding the default constructor allocation in std::set has a great effect, it seems to cause a 10% pessimization in the rest of the code. This could be because EffectsAnalyzer has a lot of empty sets and usually few elements (which is possibly affirmed by the smaller SmallSet sizes being faster), in which case the std::optional overhead would also barely show an effect.

The rest of the code may be way less effected by that one allocation, if they drown out the improvement there by allocating many times later by exceeding the small size, and also the optional overhead might weigh in way more here.

Please also note that this benchmark is heavily biased to the way Wonderland Engine writes code (which is a non-standard flavor over heavily optimized data-oriented C++) and probably not representative of "the average C++ wasm". You may want to run the benchmarks with other .wasm input to test.

All in all, @kripken, I'm considering wheter it wouldn't make sense to revert the changes to SmallSet and just use std::optional here in EffectsAnalyzer?

Aug 09 '22 15:08 Squareys

Thanks!

Ok, I did some benchmarking on a UE4 game, a Dart benchmark, and Poppler, all on Linux. Raw times are noisy but the instruction counts from perf stat are pretty robust. Numbers are in K of instructions:

	baseline	SmallSet<10> in effects.h	SmallSet<1>	SmallSet<0>	just SmallSet, no effects.h changes
Dart	27,193	26,793	26,878	27,232	27,088
Poppler	16,262	15,836	15,812	16,246	16,275
UE4	495,164	479,595	480,944	496,245	495,671

(runtimes are generally similar to these numbers).

Looks like SmallSet<0> and just changing SmallSet (no changes to effects.h) make almost no difference compared to baseline. 1 and 10 do make a difference. I'd say 1 is the safer option since it is the smallest memory size increase.

I also lean towards changing SmallSet. While the change by itself doesn't change much, as we use SmallSet more it will help more, and will avoid us needing to add std::optional in more places. And I do think it makes sense to do, the more I think on it: SmallSet assumes we will rarely need the full set, and using std::optional there makes us just have a pointer for it, rather than its full contents (which will be at least a pointer + a size).

So I suggest we land #4884 and then land this but with 1 instead of 10. What do you think?

Aug 09 '22 17:08 kripken

@kripken Sounds great! I updated this PR to use 1 instead of 10!

Aug 09 '22 17:08 Squareys

Rebased onto the changes to #4884 !

Sep 28 '22 13:09 Squareys

Rebased onto the changes to https://github.com/WebAssembly/binaryen/pull/4884 once again, which should also fix the test case that was failing over there.

Also noticed I had apparently missed two sets that were still on 10 instead of 1 element. Hope that wasn't just my faulty memory and it was intended to keep those different for some reason.

Nov 23 '22 11:11 Squareys

Rebased onto the changes to https://github.com/WebAssembly/binaryen/pull/4884 !

Mar 29 '23 21:03 Squareys