flashinfer
flashinfer copied to clipboard
Sliding window attention
While I saw this item in the roadmap, I'm wondering if this feature will be supported in the near future or not.
I skipped the item because we don't need special support for SWA if we set page_size
to 1
.
For larger page_size
, I think it's still necessary to have SWA support, added to v0.0.4 release plan.
@yzh119 Oh yes, we don't need a new kernel for decode. However, if I understand correctly, we need a new kernel for prefills?
Sorry for the late reply, it was supported in v0.1.2: https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.1.2.