flashinfer
flashinfer copied to clipboard
[Feature Request] TopK Sparse Attention
- https://huggingface.co/openbmb/MiniCPM4.1-8B
From @simon-mo, the ask here is for both Hopper and Blackwell support.
Example kernel code: https://github.com/OpenBMB/infllmv2_cuda_impl
hello, i'd like to try implementing it if possible
hello, i'd like to try implementing it if possible
Sounds great! I'm not aware of anyone else working on this.
cc @yzh119