How to pretrain from scratch the Qwen3 4B model but with MoE?
Hi Team,
Thanks a lot for your excellent work.
How do I pretrain from scratch the Qwen3 4B model but with MoE idea borrowed from the much larger Qwen3-30B-A3B model?
Could you please provide such recipe?
Thanks again!
Hi @tjoymeed , I am not sure if I fully understand your question, but if you want to train a Qwen MoE model but with smaller model size, you can start from Qwen3MoEConfig, and use smaller num_layers, hidden_size, etc.
You can start for any Qwen3 recipe, and modify the recipe.model.config to your desired one
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.