flash-linear-attention icon indicating copy to clipboard operation
flash-linear-attention copied to clipboard

[RFC] Supports dispatch from multiple backend kernels

Open zhiyuan1i opened this issue 1 week ago • 1 comments

Proposal

Triton appears to have encountered a performance bottleneck, making it necessary to support dispatch for a wide range of third-party language kernels. This mechanism assumes there is already a reference to Triton.

Also, We could use tvm-ffi to speed up triton kernel launch.

Rationale

No response

zhiyuan1i avatar Nov 26 '25 01:11 zhiyuan1i

Triton, Tilelang, CuteTile, CuteDSL, and CUDA C could be worth considering.

zhiyuan1i avatar Nov 26 '25 01:11 zhiyuan1i