flashinfer icon indicating copy to clipboard operation
flashinfer copied to clipboard

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache.

Open yzh119 opened this issue 1 year ago • 4 comments

Before this PR, FlashInfer supports KV sequence parallelism for single decode/prefill and batch decode, but not batch prefill, however, this feature is also important for batch prefill kernel. This PR implements KV partition for batch prefill kernels (on both Paged & Ragged KV-Cache).

yzh119 avatar Jan 19 '24 09:01 yzh119

@yzh119 is this PR good to use? This would be extremely useful for some of my work.

AgrawalAmey avatar Mar 14 '24 05:03 AgrawalAmey

@AgrawalAmey We did a huge amount of code refactor since the last commit of this PR, so I need to rebase and add some new commits, please stay tuned :)

yzh119 avatar Mar 16 '24 07:03 yzh119

@yzh119 looking forward to it! I would be happy to help accelerate this, please let me know if I can help in any way.

AgrawalAmey avatar Mar 16 '24 14:03 AgrawalAmey

Looking forward to it!!

ZSL98 avatar Apr 03 '24 05:04 ZSL98

@yzh119 Typing to ask if this is ready for use? I just find BatchPrefillWithRaggedKVCacheDispatched in main branch code but not sure if it could work.

chenzhuofu avatar Jun 06 '24 03:06 chenzhuofu

Moved to #310

yzh119 avatar Jun 17 '24 08:06 yzh119

@chenzhuofu @ZSL98 @AgrawalAmey This was done in #310.

yzh119 avatar Jun 19 '24 22:06 yzh119

Amazing, thanks a lot for the awesome work! 🙏

AgrawalAmey avatar Jun 19 '24 22:06 AgrawalAmey