xformers
xformers copied to clipboard
Bucketing strategy in triton kernels of sequence parallel fused operators.
🚀 Feature
When seqlen
is not fixed, triton.autotune
is triggered for every unseen value of seqlen
for tiled matmul kernel and matmul kernel. It makes the training extremely slow for variable seqlen
data.
By introducing bucketing strategy, we could skip most calls of triton.autotune
for close seqlen
values. This makes fused operators more usable.
Motivation
See above.
Pitch
When we have close values of matrix dimensions, triton.autotune
should be skipped for tiled matmul kernel and matmul kernel.
Alternatives
Current alternative is to disable triton kernels when seqlen
is not fixed.
Additional context
None.