sglang
sglang copied to clipboard
[WIP] Support double sparsity
Motivation
- Support double sparsity (post-training sparse attention) for long context inference in SGLang
- See paper
Modifications
- Add triton implementation in
sglang/python/sglang/srt/layers/sparse_decode_attention.py
- Add serving-related parts
Checklist
- [ ] Format your code according to the Contributor Guide.
- [ ] Add unit tests as outlined in the Contributor Guide.
- [ ] Update documentation as needed, including docstrings or example tutorials.