picongpu
picongpu copied to clipboard
Compilation time trace
As promised the other day, I did a quick time trace of compiling PIConGPU (SPEC example):
So from the 55s compile time, less than 5s is parsing code and the first round of template instantiations, then 10s of instantiating pending templates, and 35s of optimization (which is mostly inlining the many template instantiations).
If we look closer into the template instantiations:
This is the instantiation hierarchy for
picongpu::PluginController
. The selected (light) green block is boost::mpl::copy_if<...>
, which then causes an insane amount of further template instantiations.
You can try this yourself by compiling with clang and adding -ftime-trace
to the CMAKE_CXX_FLAGS
.
@bernhardmgruber Cool! Do you have any suggestions for speeding up compilation based on that data?
With my limited unterstanding of picongpu, these are the suggestions I can make:
- Reduce the number of template instantiations. I big portion is MPL here, because it uses pre-C++11 TMP techniques. This should be replaced by a modern TMP library like you already suggested in #1997. Whether that also reduces the amount of generated functions the optimizer has to inline is a different question. If a TMP library only computes types, the runtime stays in the frontend, which I could imagine is most of what MPL does now.
- Reduce the number of generated functions. This might be a spin off from MPL, but I don't know. This could be improved by refactoring the code base into less nested functions. E.g. if your class has a function
m1
which always callsm2
andm1
is called a lot, the compiler will have to always inline both of them, instead of just one function in casem1
andm2
could be merged somehow (e.g. with a default parameter). I could also be that there is some compile time iteration on integers, that could be replaced by a#pragma unroll
. It's really hard to say. - Parallelize compilation without repeating compilation of the same parts. That is more tricky than it sounds. You should aim for splitting the codebase into more translation units, but pay attention that two TUs don't e.g. reinstantiate the same chain of templates. This can easily happen in case both TUs share some common helper functions. More importantly, two TUs should not generate the same function instantiations again, because than you pay in both TUs and for the linker to get rid of them again :)