sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Feat][WIP] QWen-1M context support[2/2]: Update block sparse attention backend

Open FlamingoPg opened this issue 8 months ago • 0 comments

Motivation

Stack PR: [1/2]: https://github.com/sgl-project/sglang/pull/5847#event-17439387208

Todo: support cudagraphs for block sparse attention backend

Modifications

Checklist

  • [x] Format your code according to the Code Formatting with Pre-Commit.
  • [x] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [x] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [x] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

FlamingoPg avatar May 01 '25 09:05 FlamingoPg