astutils.h: use pre-sized SmallVector in visitAstNodes()
This is the first of two PR introducing the usage of SmallVector in our code.
I adjusted the value locally based on profiling it with callgrind.
When using Clang and Boost the indicator that we need to increase the size of the vector are boost::container::vec_iterator<> boost::container::vector<>::priv_forward_range_insert_no_capacity<>(...) and operator delete(void*) calls in findAstNode(...).
When using no Boost and reserve() the indicator is amount of additional operator new(...) and operator delete(...) compared to the actual calls of findAstNode(...).
Update: With findAstNode() now being inlined it is now harder to trace this. You now need to follow the operator new() or operator delete() but you won't have a function or proper call count to compare against. It seem like memcpy@GLIBC_2.2.5 is now the indicator that the vector is being expanded and needs to reserve more space.
I still need to run a bigger test to see if we might need to increase the count even more.
Here's some numbers for a -O2 build using callgrind with --enable=all --inconclusive on the mame_regtest project, DISABLE_VALUEFLOW=1.
Clang 13
2,263,092,948 - std::vector
2,242,113,164 - std::vector (reserve(8))
2,263,074,159 - SmallVector (no Boost / size 8)
2,242,113,077 - SmallVector (no Boost / size 8 / reserve(8))
2,162,604,179 - SmallVector (Boost 1.74 / size 8)
2,167,203,248 - SmallVector (Boost 1.74 / size 8 / reserve(8))
GCC values coming soon. ~I will file a ticket about the GCC difference soon.~
The Visual Studio compilation failure was caused by the incorrect C++ standard being using by the *-PCRE configurations in the cli.vcxproj.
I opened #3937 for the compilation/selfcheck fixes.
@Ken-Patrick @pfultz2 @danmar
There's another intermediate step it seems.
Using reserve() on the underlying container helps to reduce unnecessary allocations if we are not using Boost.
std::vector<T *> tokensContainer;
tokensContainer.reserve(8);
std::stack<T *, std::vector<T *>> tokens(std::move(tokensContainer));
Unfortunately that causes a performance regression when using Boost so that code requires to be conditional which is messy.
I wonder if what the reserve() call does could be achieved through the TaggedAllocator when using SmallVector without Boost as we are currently not using the given size. Unfortunately I have zero experience with custom std::allocator.
I provided values for the various cases in the description.
As std::vector::reserve() already provides a sizable improvement without need for any external dependencies I prepared #4158 which implements until SmallVector provides the same advantage in the non-Boost implementation.
With GCC I still see (all?) calls to void boost::container::vector<>::priv_push_back<>(Token const* const&) during profiling. It seems boost::container::small_vector is not working as expected with GCC.
This is almost done. 🙂 I still need to test it with the latest boost, add a build step to the CI and integrate it with the Visual Studio build (since it doesn't add any dependencies the official binaries should utilize it).
Here's a small preview of the performance win:
Clang 14 51,333,013,384 -> 37,000,473,137
GCC 12 55,161,312,808 -> 37,878,403,473
I enabled the Boost usage for the sanitizer builds. That tests the build and runs the tests as well as (hopefully) providing a slight performance improvement for the sanitized runs.
The last remaining thing is to build the Windows version with Boost.
This is finally ready for review. 🥳 I think the very slight Clang regression without Boost can be dismissed.
current -> SmallVector -> SmallVector + Boost 1.74
Performing a --enable=all --inconclusive scan on mame_regtest.
With DISABLE_VALUEFLOW=1:
Clang 14 2,026,151,486 -> 2,026,159,272 -> 1,854,855,377
GCC 12 1,994,489,219 -> 1,943,410,321 -> 1,842,068,039
With Valueflow enabled:
Clang 14 51,130,767,151 -> 51,131,952,779 -> 32,275,080,333
GCC 12 54,989,579,042 -> 49,360,382,494 -> 37,944,747,061
And scanning cli with --enable=all --inconclusive -Ilib -D __GNUC__ and DISABLE_VALUEFLOW=1
Clang 14 2,935,467,131 -> 2,935,473,066 -> 2,800,613,440
GCC 12 3,009,733,699 -> 2,968,684,311 -> 2,841,416,862