KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Reduction interface

Open vchuravy opened this issue 4 years ago • 3 comments

@glwagner needs a reduction interface so we should finally add that.

cc: @jpsamaroo

vchuravy avatar Apr 01 '21 21:04 vchuravy

What kinds of reduction intrinsics does CUDA support? AMDGPU has wfred, which reduces a single value across all active lanes. I figure this could be easy to support, like result = @reduce_warp op input

jpsamaroo avatar Apr 06 '21 00:04 jpsamaroo

maybe something like in FoldsCUDA?

rveltz avatar Sep 04 '22 07:09 rveltz

Even if there is no portable access to lower-level reduction primitives it would be good to have an example of a reduction operation (e.g. sum, maximum), probably implemented via local memory. The histogram example does that but it's more complicated than a straightforward reduction.

eschnett avatar Dec 22 '23 15:12 eschnett