genn Improve presynaptic updates using square blocks and shared memory

Improve presynaptic updates using square blocks and shared memory

Open tnowotny opened this issue 3 years ago • 1 comments

As detailed in https://arxiv.org/pdf/2107.04092.pdf one can slice the rows of the connectivity matrix into vertical slices where each slice addresses a defined slice of post-synaptic neurons. Pre-loading these neurons' relevant variables into shared memory and using shared atomics for inSyn updates reduces global memory traffic leading to measurable performance gains.

Jul 15 '21 14:07 tnowotny

Square blocks might also lets us improve the performance of spike processing in all postsynaptically-parallelised strategies by using threads in y to process multiple spikes in parallel. For sparse matrices these can atomically update inSyn in global/shared memory and for dense matrices they can update registers and then atomic add the partial sums or use warp shuffles to sum and then add.

Jul 15 '21 15:07 neworderofjamie

genn genn copied to clipboard

Improve presynaptic updates using square blocks and shared memory

genn
genn copied to clipboard