VkRadixSort
VkRadixSort copied to clipboard
Algorithm correctness
Hi!
I was analyzing how the Multi Radix Sort works and I don't fully understand how you make sure it works correctly in all circumstances.
The part that concerns me is the scattering/reordering part. Here, if I get it right you use bit masks to count the number of occurrences of a particular binID
within the workgroup, next you account this difference on the global level with atomicAdd. So if workgroups were scheduled one after another, "sequentially", your approach would be 100% correct, however the workgroup order is generally undefined and workgroups can be executed in parallel, so I wonder how your algorithm deals with the case when two workgroups are scheduled in parallel or out of order. For the sake of simplicity let's see the parallel case:
WG1 parallel to WG2
binOffset = global_offsets[binID] = x ||||||||||||||||||| binOffset = global_offsets[binID] = x
prefix = p1 ||||||||||||||||||| prefix = p2
count = c1 ||||||||||||||||||| count = c2
output[binOffset + p1] = input ||||||||||||||||||| output[binOffset + p2] = input
atomicAdd(global_offsets[binID], c1) ||||||||||||||||||| atomicAdd(global_offsets[binID], c2)
So it looks to me that p1
and p2
are local to the respective WG, so in the case of parallel execution of WG1 and WG2, [output[binOffset + p{1,2}] = input]
can step on each other toes and write to the same output indices, just because the global level atomicAdd(global_offsets[binID], c{1,2}
hasn't been reached yet.
Or do I miss something obvious?