Use SmallSet in EffectsAnalyzer
Hi @kripken & @sbc100 !
As promised in #4884, this is split off into it's own PR. For it to have any effect, though, it still depends on avoiding the default constructor allocation, implemented in that other PR.
Benchmarks for the SmallSet template argument following in short.
Best, Jonathan
Here's the benchmarks (edited for readability), script, input wasm and the executables can be found here
$ ./run-wasm-opt.sh
wasm-opt-baseline:
real 0m23.973s
wasm-opt-no-set-allocation:
real 0m23.209s
wasm-opt-smallset-10:
real 0m18.224s
wasm-opt-smallset-4:
real 0m17.072s
wasm-opt-smallset-2:
real 0m17.087s
wasm-opt-smallset-1:
real 0m17.584s
wasm-opt-smallset-0:
real 0m19.192s
---------------
Here are a couple of culprits:
Where with EffectsAnalyzer, avoiding the default constructor allocation in std::set has a great effect, it seems to cause a 10% pessimization in the rest of the code. This could be because EffectsAnalyzer has a lot of empty sets and usually few elements (which is possibly affirmed by the smaller SmallSet sizes being faster), in which case the std::optional overhead would also barely show an effect.
The rest of the code may be way less effected by that one allocation, if they drown out the improvement there by allocating many times later by exceeding the small size, and also the optional overhead might weigh in way more here.
Please also note that this benchmark is heavily biased to the way Wonderland Engine writes code (which is a non-standard flavor over heavily optimized data-oriented C++) and probably not representative of "the average C++ wasm". You may want to run the benchmarks with other .wasm input to test.
All in all, @kripken, I'm considering wheter it wouldn't make sense to revert the changes to SmallSet and just use std::optional here in EffectsAnalyzer?
Thanks!
Ok, I did some benchmarking on a UE4 game, a Dart benchmark, and Poppler, all on Linux. Raw times are noisy but the instruction counts from perf stat are pretty robust. Numbers are in K of instructions:
| baseline | SmallSet<10> in effects.h | SmallSet<1> | SmallSet<0> | just SmallSet, no effects.h changes | |
|---|---|---|---|---|---|
| Dart | 27,193 | 26,793 | 26,878 | 27,232 | 27,088 |
| Poppler | 16,262 | 15,836 | 15,812 | 16,246 | 16,275 |
| UE4 | 495,164 | 479,595 | 480,944 | 496,245 | 495,671 |
(runtimes are generally similar to these numbers).
Looks like SmallSet<0> and just changing SmallSet (no changes to effects.h) make almost no difference compared to baseline. 1 and 10 do make a difference. I'd say 1 is the safer option since it is the smallest memory size increase.
I also lean towards changing SmallSet. While the change by itself doesn't change much, as we use SmallSet more it will help more, and will avoid us needing to add std::optional in more places. And I do think it makes sense to do, the more I think on it: SmallSet assumes we will rarely need the full set, and using std::optional there makes us just have a pointer for it, rather than its full contents (which will be at least a pointer + a size).
So I suggest we land #4884 and then land this but with 1 instead of 10. What do you think?
@kripken Sounds great! I updated this PR to use 1 instead of 10!
Rebased onto the changes to #4884 !
Rebased onto the changes to https://github.com/WebAssembly/binaryen/pull/4884 once again, which should also fix the test case that was failing over there.
Also noticed I had apparently missed two sets that were still on 10 instead of 1 element. Hope that wasn't just my faulty memory and it was intended to keep those different for some reason.
Rebased onto the changes to https://github.com/WebAssembly/binaryen/pull/4884 !