sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Track] DeepSeek V3/R1 nextn progress

Open zhyncs opened this issue 10 months ago • 5 comments

Triton Backend

@ispobock @pankajroark

FlashInfer Backend

@zhyncs @yzh119

  • [x] compatible with disable MLA

  • [x] support FlashInfer nightly MLA ragged prefill and CUDA Core MLA decoding

  • [x] support FlashInfer v0.2.0.post3 MLA ragged, paged prefill and decoding (@zhyncs @yzh119 )

  • [ ] nextn parts can be shared with Triton Backend

EAGLE 2

@zhyncs @Ying1123

zhyncs avatar Feb 10 '25 14:02 zhyncs

ref MTP support: https://github.com/sgl-project/sglang/pull/3582 v0.4.3.post1 release: https://github.com/sgl-project/sglang/pull/3638

SGLang supports MTP (nextn) in the Triton backend, achieving a speed of 77 tokens/s, twice as fast as other OSS LLM engines.

zhyncs avatar Feb 17 '25 13:02 zhyncs

Woo, Thank you @zhyncs. just try new image lmsysorg/sglang:v0.4.3.post2-cu125 the performance seems similar than 0.4.2 (on 16 x H20) when running-req = 1, the gen throughput (token/s) is no more than previous.

What did I missed ?

panpan0000 avatar Feb 18 '25 11:02 panpan0000

I see compatible with radix cache and chunked prefill. How is it going? Long context scenarios require this feature. @zhyncs

lambert0312 avatar Feb 21 '25 05:02 lambert0312

The current Eagle has two issues:

  1. It does not support chunked prefill.
  2. The draft model follows the same distributed strategy as the target model.

Does the community have any plans to address these two issues?

yukavio avatar Feb 21 '25 10:02 yukavio

@yukavio chunked prefill support is on the way @merrymercy

zhyncs avatar Feb 21 '25 11:02 zhyncs

Will you support DP + MTP ?

VegetaPn avatar Mar 14 '25 13:03 VegetaPn

@zhyncs Hi, Do we multi MTP heads now? Is there an example?

MtFitzRoy avatar Jun 04 '25 04:06 MtFitzRoy

@zhyncs @pankajroark Hi, is there any progress in supporting multi MTP heads?

Qing-zy avatar Jun 30 '25 12:06 Qing-zy

Hi @pankajroark, do you any updates or docs about multi MTP headers, thanks.

tonyluj avatar Jul 28 '25 06:07 tonyluj

working on multi MTP headers ing

Qing-zy avatar Aug 19 '25 03:08 Qing-zy