FLAMEGPU2
FLAMEGPU2 copied to clipboard
Inverted Device Scatter
When scattering we currently read coalesced, and write scattered.
Performance of the scatter may be improved it we read scattered and write coalesced, although the logic for doing that is likely more complicated and may cost more than the benefit we are gained.
Specifically pbm_reorder_generic
On Mon, 14 Jun 2021 at 11:59, Peter Heywood @.***> wrote:
When scattering we currently read coalesced, and write scattered.
Performance of the scatter may be improved it we read scattered and write coalesced, although the logic for doing that is likely more complicated and may cost more than the benefit we are gained.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FLAMEGPU/FLAMEGPU2/issues/558, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVGCRMZFELGMRK5UWM2RTTSXOJZANCNFSM46U75VBQ .
This would probably be beneficial to any scatter operation (if feasible). Uncoalesced writes are more expensive than uncoalesced reads, so in a 1:1 situation coalescing writes is preferable as long as doing so require too much overhead.
I.e. how easy/cheap is it to find the source index from a destination index (the thread id). Usually (currently?) the destination index is easy to find given a source index (thread id). This might actually require an additional sorting operation over the keys, and / or some new global memory.
Yes, but the only one we know for sure is affected is that one. There are some weird scatter variations inside CUDAScatter
iirc, e.g. one is actually a broadcast.