xformers icon indicating copy to clipboard operation
xformers copied to clipboard

Bucketing strategy in triton kernels of sequence parallel fused operators.

Open Fanoid opened this issue 2 months ago • 0 comments

🚀 Feature

When seqlen is not fixed, triton.autotune is triggered for every unseen value of seqlen for tiled matmul kernel and matmul kernel. It makes the training extremely slow for variable seqlen data.

By introducing bucketing strategy, we could skip most calls of triton.autotune for close seqlen values. This makes fused operators more usable.

Motivation

See above.

Pitch

When we have close values of matrix dimensions, triton.autotune should be skipped for tiled matmul kernel and matmul kernel.

Alternatives

Current alternative is to disable triton kernels when seqlen is not fixed.

Additional context

None.

Fanoid avatar Apr 26 '24 08:04 Fanoid