Disentangle kernel-manager include tree
Summary of changes
This PR fixes the horrible kernel-manager singleton design and cleans up all circular include chains which were just hidden behind the kernel-manager and plenty "*_impl.h" files.
Even though this PR touches 250+ files, the only functional changes were done in the kernel_manager.h/.cpp and the other changes are just follow-up refactorings enabled by the changes. I also got rid of most of the impl-files, which are not required anymore and just complicated things.
Why this was necessary
Anyone who ever worked on the kernel will immediately know about the pain the kernel-manager caused. As the kernel-manager included ALL other manager in the kernel_manager.h file, any other class using the kernel-manager therefore also included ALL other managers. This resulted in either of two things:
- Any class using the kernel manager could not be used by ANY other manager anymore, as this would lead to a circular include chain.
- Any class used by other managers was not allowed to include the kernel-manager in its .h file. This meant: 2.1. The kernel-manager could only be included in the .cpp file, which meant that the function using the kernel-manager (and thus any of the other managers) could not be inlined, which in some cases is detrimental for performance. 2.2. We had to create those impl.h files and use the kernel-manager in there. However, impl.h files and correctly including them is a nightmare and leads to various linker issues which are extremely difficult to debug.
The only time when one should use impl.h files is for some template specializations.
Important to note: I kept the internal API identical. All I've done essentially was to use references to the managers instead which allowed to forward-declare the managers in the kernel_manager.h file. The reason there are that many changes is that including the kernel_manager.h no longer means that one also includes ALL other managers. So each file using a specific manager has to include that manager explicitly. This is much cleaner and will also speed up compilation a bit.
I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?
@JanVogelsang This looks very interesting! Just to make sure I understand your solution correctly: Instead of having the individual managers as actual data members of the KernelManager instance, the KernelManager now holds references to the different managers, so that kernel_manager.h only needs forward declarations of the various *Manager classes, right? That then allows removing all the inclusions and the impl-files.
This is certainly a great code cleanup. I am just a tiny bit concerned about performance, since we now have essentially pointers to the individual managers. Have you run any benchmarks yet?
I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?
Very strange. Have you tried to run the copyright check locally on your machine?
@heplesser Yes, this is correct!
I benchmarked the performance and the PR's solution is indeed around 5% slower (locally and on Juwels). I have to admit that this surprises me a lot, as while this introduces another pointer, I would have assumed that all these pointers (or mostly just the simulation_manager and event_delivery_manager pointers during simulation) would be cached anyway. This seems to not be possible and I am not sure why. Then again, the cause of the performance degradation might also be something entirely different, for example growing code size which harms instruction cache efficiency due to more function inlining.
I implemented two other solutions now, which both require the internal API to change, though:
kernel().simulation_manager ->kernel::manager<SimulationManager>()
- A template function with a function-local static manager instance.
namespace kernel {
template<typename ManagerT, std::enable_if_t<std::is_base_of_v<ManagerInterface, ManagerT>>* = nullptr>
ManagerT& manager()
{
static ManagerT static_manager;
return static_manager;
}
}
- An inline (global) variable for each manager.
namespace kernel {
template <class T>
inline T g_manager_instance;
template <class T>
T& manager() noexcept {
return g_manager_instance<T>;
}
}
I'm currently benchmarking both solutions, but local benchmarks suggest that the second approach performs best.
I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?
Very strange. Have you tried to run the copyright check locally on your machine?
I managed to solve this, it was just two empty files which I deleted but somehow they were added back in again, just empty.
@JanVogelsang Thanks for checking and the new solutions! I slightly prefer the second option, but that is mainly based on the fact that I still haven't entirely wrapped my mind around the std::enable_if_t construct.
Is it necessary to place this into a separate namespace kernel? Why not have it in the nest namespace and then just use it as manager<SimulationManager>() or maybe even get<SimulationManager>() or (since we have get in many places) use<SimulationManager>() or something like that?
@JanVogelsang, this is big work! Thanks a lot for this big cleaning effort! Currently there still seem to be some build errors, but most of the changes seem fine to me. I commented a few places already, and will do a proper review once the CI passes.
Could you also check the relation to #3465 ?
@JanVogelsang Thanks for checking and the new solutions! I slightly prefer the second option, but that is mainly based on the fact that I still haven't entirely wrapped my mind around the
std::enable_if_tconstruct.Is it necessary to place this into a separate namespace
kernel? Why not have it in thenestnamespace and then just use it asmanager<SimulationManager>()or maybe evenget<SimulationManager>()or (since we havegetin many places)use<SimulationManager>()or something like that?
That's totally fine as well. The additional/nested namespace (nest::kernel::) was just intended to make the transition for other devs easier, as kernel().X_manager isn't that far away from kernel::manager<XManager>(). But of course, any other flavor of that is great as well. I think I like the use<>() flavor most, but that's just personal taste.
While there is quite some fluctuation, it seems like the inline-approach (number 2 above) performs best and on par with master.
@JanVogelsang, this is big work! Thanks a lot for this big cleaning effort! Currently there still seem to be some build errors, but most of the changes seem fine to me. I commented a few places already, and will do a proper review once the CI passes.
Could you also check the relation to #3465 ?
It's still compatible with that PR, the role of the kernel manager stays the same.
During an in-person meeting @diesmann @heplesser @terhorstd and I decided on approach number 2 (see above) for our manager objects: Inline global variables. They have the following characteristics:
- They have external linkage, so there is only one instance across all translation units (compared to static global variables which are duplicated per TU).
- They are either initialized at the start of the program or when first using them (implementation-defined). But as far as I could tell, there are no additional checks whether the object is already initialized or not (which on the other hand are automatically added by the compiler for statics used inside functions). Their initialization order is implementation-defined (which doesn't matter for us as we explicitly initialize all managers).
We also decided to now use link-time optimizations (LTO) which I also enabled in this PR as well (in CMake). This enables us to move all non-template function definitions to .cpp files without having to worry about performance lost by not inlining, as LTO automatically handles inlining of these functions for us.
For template function definitions, we will either put them into the header file if there are only few functions or create _impl.h files which contain all template-function implementations.
When using Clang, I get the following error:
/usr/bin/ld: error: Failed to link module ../nestkernel/libnestkernel.a.llvm.1568556.node_collection.cpp: Expected at most one ThinLTO module per bitcode file
Even after lots of researching and debugging, I couldn't find a decent fix for the issue. Only disabling thin LTO (i.e., forcing full LTO) resolved the issue, but this might not be a good solution:
# set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -flto=full")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -flto=full")
@JanVogelsang Under macOS 26, I could build your master out of the box with Apple Clang 17. The entire testsuite passes. I checked that -lto=thin was included in the compiler invocations.
Some net searching indicated that multiple object files with the same name can lead to the error you got (https://github.com/llvm/llvm-project/issues/42296) and indeed running in the build directory
ls */CMakeFiles/*.dir/*.o | cut -d'/' -f4 | sort | uniq -d
provide a hint and we find that we have
./nest/CMakeFiles/nest.dir/neststartup.cpp.o
./nest/CMakeFiles/nest_lib.dir/neststartup.cpp.o
Maybe renaming one of the two neststartup.cpp files we have can solve the problem?
Another comment on this PR: I'd much prefer to get pynest-ng done first ...
Oh great find! Yes, this is most likely the issue.
This PR isn't super urgent, so we can surely wait for PyNEST-ng. The only reason why we might want to get this rolled out sooner than later would be to minimize the amount of merge conflicts for new branches that still use the current kernel manager implementation while they could already use this refactored version. Further, any kernel developers would of course benefit from this PR and make their life easier.
Thanks to @FinnBurkhardt for moving all function definitions to .cpp/_impl.h files!
When merging this commit, make sure to squash it, as there are quite numerous commits here now.
Took me a couple of hours, but I fixed all the remaining compiler and linker issues now. Unfortunately, 3 tests are now failing and I am unsure why:
unittests.test_gamma_sup_generator.sli unittests.test_ppd_sup_generator.sli unittests.test_spike_poisson_ps_base2.sli
They fail with:
event.cpp:269: size_t nest::Event::get_sender_node_id() const: Assertion sender_node_id_ > 0' failed.`
As this PR doesn't introduce any changes to functionality, this is rather surprising. Either something went wrong while merging master or moving around function definitions broke something (not sure how this could be possible though).
No idea how to interpret those Pytest aborts (see failing testsuite).
No idea how to interpret those Pytest aborts (see failing testsuite).
Strange indeed. Does the testsuite pass locally on your computer? The worst problem seems to be the complete failure to run any of the plain pytests.