Nabla icon indicating copy to clipboard operation
Nabla copied to clipboard

Single Dispatch Radix Sort

Open devshgraphicsprogramming opened this issue 4 years ago • 3 comments

Description

Port ext::RadixSort to the New API and promote it to core at the same time.

Description of the related problem

Take example from the video::CScanner and implement a video::CRadixSorter<key_uint_t,reference_uint_t> which performs the all radix sort steps in a single dispatch.

Solution proposal

Override the default scanning shader and add more work to the first and last workgroup.

First prototype can do 1 dispatch per radix.

Additional context

Atomic spinlock synchronisation of the scan works like this...

Previous workgroups atomically increment by 1 a unique per-level counter at offset localWorkgroupIndexInThisLevel/WORKGROUP_SIZE when they finish computation (every such offset is assigned to a higher level workgroup).

The next level workgroup spins on the counter assigned to it, until it reaches a value equal to the total number of preceeding workgroups feeding data to this workgroup.

Extending this to arbitrary number of scans with work on the original elements inbetween, just requires careful allocation of synchronisation counters.

To tackle in 2022, we don't need this yet.

Cannot be done without UB, only the Upsweep Reduction Phase can be single dispatch, downsweep needs multiple