flash-linear-attention [RFC] Supports dispatch from multiple backend kernels

[RFC] Supports dispatch from multiple backend kernels

Open zhiyuan1i opened this issue 1 week ago • 1 comments

Proposal

Triton appears to have encountered a performance bottleneck, making it necessary to support dispatch for a wide range of third-party language kernels. This mechanism assumes there is already a reference to Triton.

Also, We could use tvm-ffi to speed up triton kernel launch.

Rationale

No response

Nov 26 '25 01:11 zhiyuan1i

Triton, Tilelang, CuteTile, CuteDSL, and CUDA C could be worth considering.

Nov 26 '25 01:11 zhiyuan1i

flash-linear-attention flash-linear-attention copied to clipboard

[RFC] Supports dispatch from multiple backend kernels

Proposal

Rationale

flash-linear-attention
flash-linear-attention copied to clipboard