triton icon indicating copy to clipboard operation
triton copied to clipboard

Emit remarks for ReduceOp lowering failing to fit within a single warp

Open plotfi opened this issue 9 months ago • 0 comments

When reduce ops fail to fit within a warp, lots of SMEM operations and sync instructions are generated because outside of a warp registers can not be used to accumulate the result of the reduction.

Adding a remark can help Triton devs to catch such inefficient codegen. A really easy way to trigger this is to load odd-sized row lengths that result in a blocked layout that has a sizePerThread = [1, 1] rather than something like a sizePerThread = [1, 8] and this will result in a wider reduction to be handled that can exceed a warp bound.

plotfi avatar Feb 28 '25 07:02 plotfi