KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

Pre-launch workgroupsize auto-tuning

Open tkf opened this issue 4 years ago • 1 comments

If the caller (host-side code) of a kernel needs to pre-allocate buffer that depends on workgroupsize and the workgroupsize is not specified, the caller needs to run the auto-tuning of workgroupsize before launching the kernel. For example, I used it for implementing "mapreduce" kernel in FoldsCUDA.jl. Can we have an API for invoking workgroupsize auto-tuning before launching the kernel?

tkf avatar Feb 21 '21 23:02 tkf

Can this be supported with dynamic localmem #11?

tkf avatar Feb 21 '21 23:02 tkf