genn icon indicating copy to clipboard operation
genn copied to clipboard

Improve presynaptic updates using square blocks and shared memory

Open tnowotny opened this issue 3 years ago • 1 comments

As detailed in https://arxiv.org/pdf/2107.04092.pdf one can slice the rows of the connectivity matrix into vertical slices where each slice addresses a defined slice of post-synaptic neurons. Pre-loading these neurons' relevant variables into shared memory and using shared atomics for inSyn updates reduces global memory traffic leading to measurable performance gains.

tnowotny avatar Jul 15 '21 14:07 tnowotny

Square blocks might also lets us improve the performance of spike processing in all postsynaptically-parallelised strategies by using threads in y to process multiple spikes in parallel. For sparse matrices these can atomically update inSyn in global/shared memory and for dense matrices they can update registers and then atomic add the partial sums or use warp shuffles to sum and then add.

neworderofjamie avatar Jul 15 '21 15:07 neworderofjamie