How to pretrain from scratch the Qwen3 4B model but with MoE?

Open tjoymeed opened this issue 6 months ago • 1 comments

Hi Team,

Thanks a lot for your excellent work.

How do I pretrain from scratch the Qwen3 4B model but with MoE idea borrowed from the much larger Qwen3-30B-A3B model?

Could you please provide such recipe?

Thanks again!

Jun 05 '25 22:06 tjoymeed

Hi @tjoymeed , I am not sure if I fully understand your question, but if you want to train a Qwen MoE model but with smaller model size, you can start from Qwen3MoEConfig, and use smaller num_layers, hidden_size, etc.

You can start for any Qwen3 recipe, and modify the recipe.model.config to your desired one

Jun 13 '25 17:06 suiyoubi

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jul 14 '25 02:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 21 '25 02:07 github-actions[bot]