When is the Qwen3 model expected to support Eagle3?
The Qwen3-14B/32B models have been open-sourced, and we are currently researching the performance of Qwen3. We hope the Eagle3 model can support Qwen3 and would like to know the estimated timeline for this support. Thank you.
We have successfully trained the Eagle3 versions of Qwen3-8B and Qwen3-30B-A3B based on the official training code, and have open-sourced them. On a single H200 GPU using the sglang inference framework, Qwen3-8B with Eagle3 achieves a performance boost from 186 tokens/second to 365 tokens/second, while Qwen3-30B-A3B with Eagle3 improves from 147 tokens/second to 231 tokens/second.
We used the ultra_200k test set and re-ran inference on Qwen3 to regenerate the data, which was then used as the final training set.A total of 600K dialogues were used as the training set.
https://huggingface.co/Tengyunw/qwen3_30b_moe_eagle3
https://huggingface.co/Tengyunw/qwen3_8b_eagle3
Additionally, we have also published a report detailing how to reproduce the Eagle3 training process. The report link is provided below for your reference if needed.
https://mp.weixin.qq.com/s/Dmdg6aLgFHZEcm6TY1vKkA
https://zhuanlan.zhihu.com/p/1923763301432662012
@fanqingyu0604
We have successfully trained the Eagle3 versions of Qwen3-8B and Qwen3-30B-A3B based on the official training code, and have open-sourced them. On a single H200 GPU using the sglang inference framework, Qwen3-8B with Eagle3 achieves a performance boost from 186 tokens/second to 365 tokens/second, while Qwen3-30B-A3B with Eagle3 improves from 147 tokens/second to 231 tokens/second.
We used the ultra_200k test set and re-ran inference on Qwen3 to regenerate the data, which was then used as the final training set.A total of 600K dialogues were used as the training set.
https://huggingface.co/Tengyunw/qwen3_30b_moe_eagle3
https://huggingface.co/Tengyunw/qwen3_8b_eagle3
Additionally, we have also published a report detailing how to reproduce the Eagle3 training process. The report link is provided below for your reference if needed.
https://mp.weixin.qq.com/s/Dmdg6aLgFHZEcm6TY1vKkA
https://zhuanlan.zhihu.com/p/1923763301432662012
Your training process seems only to modify the address of the target model and the data templation. That means you used one layer of Llama to train EAGLE3 of Qwen3?I have completed the EAGLE3 training code of Qwen series as well and my modification is more than you.
@Siegfried-qgf can u share your modification?
@Siegfried-qgf One more step I forgot to mention is obtaining the hidden_states. It needs to be aligned with the implementation on Hugging Face, and you can’t entirely rely on the official code to retrieve it, because that part was rewritten by them.
how to run qwen3-8b with eagle3 in this repo?
how to choose --model-type ?
@Siegfried-qgf One more step I forgot to mention is obtaining the hidden_states. It needs to be aligned with the implementation on Hugging Face, and you can’t entirely rely on the official code to retrieve it, because that part was rewritten by them.
I also encountered this issue following your blog. Could you show the specific code changes? @jiahe7ay
@Siegfried-qgf One more step I forgot to mention is obtaining the hidden_states. It needs to be aligned with the implementation on Hugging Face, and you can’t entirely rely on the official code to retrieve it, because that part was rewritten by them.
I also encountered this issue following your blog. Could you show the specific code changes? @jiahe7ay
@rainkert outs = self.target_model(input_ids=input_ids, attention_mask=attention_mask,output_hidden_states=True)
how to run qwen3-8b with eagle3 in this repo? how to choose --model-type ?
@bitxsw93 in this link: https://huggingface.co/Tengyunw/qwen3_8b_eagle3 , here is a tutorial on how to use the Egale3 algorithm with Qwen3-8b in the SGLang inference framework.
We have open-sourced the Eagle3 model from the Qwen3 series and released its Benchmark. You are welcome to try it out!
- Github Repo: https://github.com/Tencent/AngelSlim
- Hugging Face collection: huggingface qwen3-EAGLE3
- modelscope collection: modelscope Qwen3-EAGLE3
|   |   | MT-bench | HumanEval | GSM8K | Alpaca | Mean | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Temperature | Model | Speedup | τ | Speedup | τ | Speedup | τ | Speedup | τ | Speedup | τ |
| T=0 | Qwen3-1.7B | 2.05x | 2.81 | 2.07x | 2.93 | 2.11x | 2.98 | 1.93x | 2.69 | 2.04x | 2.85 |
| Qwen3-4B | 2.21x | 3.01 | 2.36x | 3.24 | 2.42x | 3.13 | 2.32x | 2.75 | 2.33x | 3.03 | |
| Qwen3-8B | 2.63x | 3.65 | 2.76x | 3.85 | 2.82x | 3.90 | 2.62x | 3.48 | 2.70x | 3.72 | |
| Qwen3-14B | 2.23x | 3.30 | 2.53x | 3.74 | 2.56x | 3.79 | 2.16x | 3.13 | 2.37x | 3.49 | |
| Qwen3-32B | 2.39x | 2.78 | 2.37x | 2.81 | 2.47x | 2.92 | 2.42x | 2.53 | 2.41x | 2.76 | |
| Qwen3-30B-A3B | 2.84x | 3.63 | 2.27x | 3.09 | 2.64x | 3.42 | 2.83x | 3.56 | 2.64x | 3.42 | |
| T=0 | Qwen3-1.7B | 1.74x | 2.53 | 1.86x | 2.70 | 1.82x | 2.69 | 1.72x | 2.46 | 1.93x | 2.60 |
| Qwen3-4B | 1.93x | 2.60 | 2.00x | 2.84 | 2.11x | 2.82 | 2.34x | 2.50 | 1.75x | 2.69 | |
| Qwen3-8B | 1.98x | 2.75 | 2.25x | 3.11 | 2.31x | 3.15 | 2.10x | 2.76 | 2.90x | 2.94 | |
| Qwen3-14B | 1.71x | 2.61 | 1.95x | 2.87 | 2.04x | 3.08 | 1.68x | 2.55 | 2.90x | 2.78 | |
| Qwen3-32B | 1.62x | 1.91 | 1.71x | 2.05 | 1.78x | 2.10 | 1.80x | 1.95 | 1.62x | 2.00 | |
| Qwen3-30B-A3B | 1.91x | 2.46 | 2.00x | 2.64 | 1.90x | 2.53 | 1.80x | 2.32 | 1.90x | 2.48 | |
@yghstill Thank you for reproducing EAGLE-3. We will add your Huggingface link to our ReadMe for broadcasting.
@yghstill Thank you for reproducing EAGLE-3. We will add your Huggingface link to our ReadMe for broadcasting.
@hongyanz Excellent, we're honored to contribute.
@yghstill Hi, I'm currently trying to reproduce this work, and I'm wondering how metrics like acceptance rate, throughput, and speed are calculated? It seems like the sglang backend logs only show individual entries but not the overall statistics. Thanks in advance!
maybe this pr will help #271