GPUArrays.jl icon indicating copy to clipboard operation
GPUArrays.jl copied to clipboard

Port `reverse` from CUDA.jl

Open christiangnrd opened this issue 4 months ago • 3 comments

This may have to wait for KA 0.10 depending on how much cpu=true affects performance.

christiangnrd avatar Aug 04 '25 19:08 christiangnrd

Seems like at least with CUDA.jl, using dynamic workgroup sizes recovers ~50% of the performance lost switching over to KernelAbstractions. Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

christiangnrd avatar Aug 05 '25 21:08 christiangnrd

Is there potentially some overhead with KA that is lesser with Dynamic workgroup sizes?

cc @vchuravy

maleadt avatar Sep 01 '25 05:09 maleadt

Huh, I would expect static kernel sizes to be a performance benefit or at least performance neutral.

The only thing that could happen is that suddenly we are able to unroll more or something like that.

vchuravy avatar Sep 01 '25 09:09 vchuravy