sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[WIP] Support double sparsity

Open andy-yang-1 opened this issue 5 months ago • 4 comments

Motivation

  • Support double sparsity (post-training sparse attention) for long context inference in SGLang
  • See paper

Modifications

  • Add triton implementation in sglang/python/sglang/srt/layers/sparse_decode_attention.py
  • Add serving-related parts

Checklist

  • [ ] Format your code according to the Contributor Guide.
  • [ ] Add unit tests as outlined in the Contributor Guide.
  • [ ] Update documentation as needed, including docstrings or example tutorials.

andy-yang-1 avatar Sep 18 '24 22:09 andy-yang-1