cpp-sort
cpp-sort copied to clipboard
The compile times are super slow
This isn't new, but the huge template mess everywhere in the library means that the compile times are slow, and they are currently even more terrible in the mutable sorters branch (which doesn't matter for now because it is blocked by internal compiler errors in clang++, but still...). However reducing those compile times without breaking the current API and reducing the power of the wrappers seems tricky. To be honest, I don't even know where to start to reduce the evergrowing template instantiations that occur everywhere. I tried a few things knowing that it wouldn't change much:
- Replace most
std::decay_tbyremove_cvref_t - Invert the
choicemechanism inhybrid_adapterto reduce its recursive instantiations from 127 to <30 most of the time - Move a few elements out of templates when they don't depend on every template parameter
- Don't use
sorter_facadeto implement sorter adapters when they can already handle all theoperator()overloads by themselves
It decreased the debug executable size a bit, but that's pretty much the only noticeable achievement.
But in the grand scheme of things it doesn't change much. I am looking for ideas to reduce a bit the compile times without breaking the library API, but I've only got a few tiny ideas:
- Maybe I could move even more things out of some templates. (I'm out of ideas for where that would be relevant)
- Turning trait-like class templates into templated
usingwhen possible generally ends up in the compiler instantiating fewer templates, which may help reduce both binary size and compile times. (I did that in a few places, now I don't know where it might still be relevant) - We could introduce more specialized
sorter_facadederivatives for specific sorter implementations since the ones in the library mostly have the same functions, but it doesn't look easy to make it work generically for user-defined sorters without introducing more tags. std::conjunctionand friends are apparently slow and probably unneeded in most places; I should analyze the library to reduce their uses if possible.- Some libraries apparently improved compile time by replacing calls to
std::moveandstd::forwardby a bunch ofstatic_cast, even though it doesn't look like the cleanest thing to do. (I tried that forstd::forwardand didn't notice a difference) - On some platforms, SFINAE conditions are 10x faster to compile when they are in the return type (as opposed to the template parameter list).
- Qualifying more calls in specific places may avoid to gratuitously trigger ADL so that the compiler won't needlessly look up for a function in the argument's associated namespace.
- Replacing calls to
std::distanceby subtractions when functions only handle random-access iterators may decrease the amount of tag dispatch happening (same for other iterators operations). It might also make it obvious when code is designed to work with random-access iterators only. - Some compilers provide intrinsics like
_EnableIfthat don't trigger template instantiations, which might be worth considering if they improve the state of things. On the other hand we should check whether it makes error message worse or not, because #28 is a thing too.
Those are only small changes, and I doubt that it will make a difference. If anyone has better ideas to reduce the compile times without breaking the API, I welcome such ideas :p
List of articles about speeding up compilation of templates and related stuff (suggest articles if needed):
-
https://odinthenerd.blogspot.fr/2017/03/start-simple-with-conditional-why.html
-
https://boost-experimental.github.io/di/cppnow-2016/#/10/45
-
https://mpark.github.io/programming/2017/08/08/using-killed-the-FUN-on-GCC-5/
-
https://baptiste-wicht.com/posts/2017/09/how-i-made-deep-learning-library-38-faster-to-compile-optimization-and-cpp17-if-constexpr.html
I compiled the test suite with Clang 9's new -time-trace option and visualized the results with speedscope. Here are the main takeaways so far:
- Most of the time is spent in front-end: sometimes in the parsing phase, sometimes in the template instantiation phase.
- Despite already optimizing it in the past,
<catch2/catch.hpp>is still a heavy hitter during the parsing phase. Fortunately, Catch2 v3.0 should be released in the near future and reduce the compile times. - When used,
cppsort::sortandcppsort::default_sorterare expensive. The usefulness ofcppsort::sortis debatable and I might want to nuke it for cpp-sort 2.0; evencppsort::stable_sortis probably not that good an idea since people should learn aboutstable_adapterinstead, which by the way is easier to use in C++17 thanks to CTAD.cppsort::default_sorteris arguably a bad idea too and I could see it disappearing in cpp-sort 2.0: people come to this library to test miscellaneous sorting algorithms, when all they want is a fast default,std::sortalready does a fine enough job. I will likely open issues for those. std::result_ofand everything using it takes a lot of instantiation time, which makessorter_traits.han expensive header to use - yet it's everywhere in the library. Or maybe it seems to be the case becauseresult_ofis often the first thing to instantiate the sorter templates.- The previous point on the other hand also revealed that
std::conjunction& friends skipping template instantiations might actually help.
Those are some preliminary results, more to come later.
The branch catch2-v3 contains an experimental switch to Catch2 3.0.0 using the top of the development branch. So far every file in the test suite is faster to parse thanks to the light granular includes, but I don't know whether a full build is faster because it has to download & compile Catch2 before compiling cpp-sort's test suite. However I do expected incremental builds to greatly benefit from the change.
I plan to merge the branch once Catch2 3.0.1 is officially released.