chapel
chapel copied to clipboard
Support all/more kinds of reductions on GPU
https://github.com/chapel-lang/chapel/pull/24787 will add +
, min
and max
reductions. These are probably the most common reduction kinds, but more importantly, CUB/hipCUB has direct support for those.
minloc
and maxloc
reductions are also supported in CUB via ArgMin
and ArgMax
functions. We'll need to wire some more things from the compiler into the runtime. I expect this work to be relatively straightforward implementation in the compiler and the runtime.
For other reduction types, we need engineer a way to use CUB's generic Reduce
interface where we pass a function of ours into CUB to handle the reduction. IOW, CUB is supposed to call the accumulate
function of the user-defined reduction implementation. Needless to say, the priority should be for those reduction kinds that we already have and not any kind of generic user-defined reductions.