arbor
arbor copied to clipboard
Combining mechanisms into a single mechanisme for performance increase of the GPU backend.
Describe the feature you need The ability to merge mechanisms into a single kernel. A test of a dendrite segment with 4 mechanisms combined into a single mechanism through the mechanism ABI targeting a GPU back end made the accumulative time spent in kernels twice as fast as the current case. This feature would be very specifically targeting the GPU back end where most of the speedup is expected.
Explain what it is supposed to enable Faster simulations in general simply because of less overhead of kernel/function calls.
Additional context A potential solution would be to have the ability to create an object in Cmake/python that selected multiple single describes mechanisms and merges this into a single mechanism.
Another option is a standalone python script that takes some nmodl files as arguments and outputs a single nmodl file
Hi,
thanks for the suggestion and we agree on the idea (it has come up internally multiple times). Do you have an estimate of the expected gain?
As for our stance on it, we have to say that it is a lot of work and to our knowledge infeasible without having runtime compilation at cell instantiation time. This would require us to have the relevant toolchains available at that time. However, this is likely to cause issues when running on large clusters with separate front-/backends. Even if we could guarantee presence of the tools, the machinery needed to robustly invoke them and handle errors seems quite cumbersome.
On the AOT suggestion: Since Arbor assigns mechanisms from regions to discrete CVs at build time, regions might overlap in CVs depending on the concrete morphology. That would require us to generate all 2^N combinations of mechanisms beforehand and -- given about 100 mechanisms in our catalogues -- this is infeasible.
To summarise: We are aware of the idea and like it in principle, but our estimate is that it will require lots of work and will likely result in less than stable outcomes. Therefore, unless there is a major usecase for this or someone willing to build this we would like to add it to the backlog.
If you have NML2 mechanisms (and simulations) as input, nmlcc (https://github.com/thorstenhater/nmlcc) will build bespoke specialised and combined mechanisms for the regions mentioned in the simulation. I have seen 20-30% speed-up comparing HH as 3 channels vs one combined.
So, given that you can use this in NML2, we observe <20% speed-up in the best case, I'll consider this closeable.