sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Support XiaomiMiMo inference with mtp

Open ryang-max opened this issue 7 months ago • 3 comments

Motivation

Support XiaomiMiMo inference with mtp

Modifications

Add new model support.

Checklist

  • [x] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

ryang-max avatar May 06 '25 15:05 ryang-max

TODO: Currently there is still some little problems about the MTP model service, DO NOT MERGE

ryang-max avatar May 06 '25 15:05 ryang-max

Could you add the accuracy test result and add ci test?

ispobock avatar May 07 '25 06:05 ispobock

For memory issue, you can ref https://github.com/sgl-project/sglang/blob/fba8eccd7ebe41bbdbf70ab3b6a2df1835f8b532/python/sglang/srt/model_executor/model_runner.py#L725 to make similar changes.

ispobock avatar May 12 '25 14:05 ispobock

Could you add the accuracy test result and add ci test?

mtp-related ci test added

ryang-max avatar May 15 '25 15:05 ryang-max

Great work

zhaochenyang20 avatar May 15 '25 15:05 zhaochenyang20

Could you share the result of python3 -m sglang.test.send_one?

ispobock avatar May 17 '25 03:05 ispobock

Could you share the result of python3 -m sglang.test.send_one?

Sure, thanks for pointing out this test. Result also pasted in description.

python3 -m sglang.test.send_one

w/ mtp
-------------------------------------------------------
acc_length=1.76
speed=103.04 token/s
-------------------------------------------------------

w/o mtp
-------------------------------------------------------
acc_length=1.00
speed=79.35 token/s
-------------------------------------------------------

ryang-max avatar May 17 '25 04:05 ryang-max