sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Refactor flashinfer logic for deepseek v3

Open Fridge003 opened this issue 10 months ago • 0 comments

Motivation

flashinfer_backend.py for attention is too complex, this PR extract the logic of MLA and creates a new flashinfer_mla_backend.py

Modifications

  • Define FlashInferMLAAttnBackend in flashinfer_mla_backend.py by removing codes irrelevant to MLA in flashinfer_backend.py
  • Simplify the code in forward of MLA

Checklist

  • [x] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Fridge003 avatar Feb 22 '25 09:02 Fridge003