quda icon indicating copy to clipboard operation
quda copied to clipboard

Fix performance of transform_reduce

Open maddyscientist opened this issue 4 years ago • 0 comments

Reminder to myself. Since it was made generic, some kernels based on transform_reduce are seeing significant regressions. This is because it maps to the multi-reduction kernel, which limits the number of threads per block instantiated to 256. Previously, it was instantiated for 512 thread per block.

maddyscientist avatar Mar 04 '21 17:03 maddyscientist