flashinfer
flashinfer copied to clipboard
[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache.
Before this PR, FlashInfer supports KV sequence parallelism for single decode/prefill and batch decode, but not batch prefill, however, this feature is also important for batch prefill kernel. This PR implements KV partition for batch prefill kernels (on both Paged & Ragged KV-Cache).
@yzh119 is this PR good to use? This would be extremely useful for some of my work.
@AgrawalAmey We did a huge amount of code refactor since the last commit of this PR, so I need to rebase and add some new commits, please stay tuned :)
@yzh119 looking forward to it! I would be happy to help accelerate this, please let me know if I can help in any way.
Looking forward to it!!
@yzh119 Typing to ask if this is ready for use? I just find BatchPrefillWithRaggedKVCacheDispatched
in main branch code but not sure if it could work.
Moved to #310
@chenzhuofu @ZSL98 @AgrawalAmey This was done in #310.
Amazing, thanks a lot for the awesome work! 🙏