compute-shader-101 icon indicating copy to clipboard operation
compute-shader-101 copied to clipboard

Prefix sum implementation WIP

Open raphlinus opened this issue 4 years ago • 0 comments

This is a draft, I'm still working on it. I'll likely create another subdirectory for tests and add this to that, as overwriting the main hello example is not very good form. But I'm doing it for expedience.

The version at tip of tree as I write this (87e5b20) works well on AMD 5700 XT. In fact, it works very well - I'm seeing 36.4 billion elements/s, which is excellent. It's within a sliver of a compute shader that just copies input to output, and looking at GPU counters suggests that memory bandwidth is pretty well saturated.

This version also makes some progress on each spin, so does not depend on strong forward progress guarantees from the GPU.

That said, I am employing the atomicOr workaround for the atomic bugs I'm seeing, otherwise I get both incorrect results and hangs (try N_DATA = 1 << 17 for a nice mix of the two). I will probably work on a simplified version of the test to exercise the atomic problems without bringing in all of the complexity of full prefix sum.

raphlinus avatar Nov 04 '21 00:11 raphlinus