FLAMEGPU2 Inverted Device Scatter

When scattering we currently read coalesced, and write scattered.

Performance of the scatter may be improved it we read scattered and write coalesced, although the logic for doing that is likely more complicated and may cost more than the benefit we are gained.

Jun 14 '21 10:06 ptheywood

Specifically pbm_reorder_generic

On Mon, 14 Jun 2021 at 11:59, Peter Heywood @.***> wrote:

When scattering we currently read coalesced, and write scattered.

Performance of the scatter may be improved it we read scattered and write coalesced, although the logic for doing that is likely more complicated and may cost more than the benefit we are gained.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FLAMEGPU/FLAMEGPU2/issues/558, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVGCRMZFELGMRK5UWM2RTTSXOJZANCNFSM46U75VBQ .

Jun 14 '21 11:06 Robadob

This would probably be beneficial to any scatter operation (if feasible). Uncoalesced writes are more expensive than uncoalesced reads, so in a 1:1 situation coalescing writes is preferable as long as doing so require too much overhead.

I.e. how easy/cheap is it to find the source index from a destination index (the thread id). Usually (currently?) the destination index is easy to find given a source index (thread id). This might actually require an additional sorting operation over the keys, and / or some new global memory.

Jun 14 '21 12:06 ptheywood

Yes, but the only one we know for sure is affected is that one. There are some weird scatter variations inside CUDAScatter iirc, e.g. one is actually a broadcast.

Jun 14 '21 12:06 Robadob

FLAMEGPU2 FLAMEGPU2 copied to clipboard

Inverted Device Scatter

FLAMEGPU2
FLAMEGPU2 copied to clipboard