flash-linear-attention
flash-linear-attention copied to clipboard
[RFC] Supports dispatch from multiple backend kernels
Proposal
Triton appears to have encountered a performance bottleneck, making it necessary to support dispatch for a wide range of third-party language kernels. This mechanism assumes there is already a reference to Triton.
Also, We could use tvm-ffi to speed up triton kernel launch.
Rationale
No response
Triton, Tilelang, CuteTile, CuteDSL, and CUDA C could be worth considering.