sglang Support XiaomiMiMo inference with mtp

Motivation

Support XiaomiMiMo inference with mtp

Modifications

Add new model support.

Checklist

[x] Format your code according to the Code Formatting with Pre-Commit.
[ ] Add unit tests as outlined in the Running Unit Tests.
[ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
[ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
[ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
[ ] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

May 06 '25 15:05 ryang-max

TODO: Currently there is still some little problems about the MTP model service, DO NOT MERGE

May 06 '25 15:05 ryang-max

Could you add the accuracy test result and add ci test?

May 07 '25 06:05 ispobock

For memory issue, you can ref https://github.com/sgl-project/sglang/blob/fba8eccd7ebe41bbdbf70ab3b6a2df1835f8b532/python/sglang/srt/model_executor/model_runner.py#L725 to make similar changes.

May 12 '25 14:05 ispobock

Could you add the accuracy test result and add ci test?

mtp-related ci test added

May 15 '25 15:05 ryang-max

Great work

May 15 '25 15:05 zhaochenyang20

Could you share the result of python3 -m sglang.test.send_one?

May 17 '25 03:05 ispobock

Could you share the result of python3 -m sglang.test.send_one?

Sure, thanks for pointing out this test. Result also pasted in description.

python3 -m sglang.test.send_one

w/ mtp
-------------------------------------------------------
acc_length=1.76
speed=103.04 token/s
-------------------------------------------------------

w/o mtp
-------------------------------------------------------
acc_length=1.00
speed=79.35 token/s
-------------------------------------------------------

May 17 '25 04:05 ryang-max