Is Qwen3 pretraining architectural features fully supported now?
Is Qwen3 pretraining architectural features fully supported now?
Hi Team,
Thanks a lot for your excellent work!
Is Qwen3 pretraining architectural features fully supported now?
Could you please provide an architectural feature list with support status?
Thanks again!
Hi, we support general pretraining (without reasoning or long context extension), as well as full and parameter-efficient finetuning.
What about MoE pretraining?
All Qwen 3 variants are supported, including 6 dense models and 2 MoE models.
The Qwen 3 pretraining actually has 3 stages:
- pretraining for 4096 context length;
- pretraining for enhancing reasoning;
- pretraining for extending to long context.
So NeMo currently supports 1 (all architectural features) not 2 or 3?
Thanks!
Yes, what you described is correct
Okay, when will stage 3 pretraining be fully supported? Thanks!
All Qwen 3 variants are supported, including 6 dense models and 2 MoE models.
Could you please tell me where to find the recipe for Qwen3 MoE pretraining?
We're working on better long context training support right now, but I don't have any near term ETA to share with you at this time.
Qwen3 MoE recipes can be found here: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/qwen3_30b_a3b.py#L55 https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/qwen3_235b_a22b.py#L55
Great! Thanks a lot! Could you please enlighten me a bit on this: how to modify the recipe for qwen3_30b_a3b into pretraining from scratch a qwen3_12b-a1b? Thanks again!
Okay, when will stage 3 pretraining be fully supported? Thanks!
Is it coming out soon? Anxiously awaiting it...
PAI-Megaton-Patch has the stage 3. Maybe there is a way to integrate the two together?
Great! Thanks a lot! Could you please enlighten me a bit on this: how to modify the recipe for qwen3_30b_a3b into pretraining from scratch a qwen3_12b-a1b? Thanks again!
There is no definitive answer to this, but you can check out the difference between 235b_a22b and 30b_a3b -- these parameters are downsized: num_layers, hidden_size, num_attention_heads, moe_ffn_hidden_size.
Of course it's not guaranteed that downsizing these further to create 12b-a1b would create a model that still converges. Deep learning is an art :)
Okay, when will stage 3 pretraining be fully supported? Thanks!
Is it coming out soon? Anxiously awaiting it...
PAI-Megaton-Patch has the stage 3. Maybe there is a way to integrate the two together?
Sorry, I don't have any intel on when it will come out. I'll let you know once I know more.
Hi, we are targeting support YaRN and other long context feature next release, which is NeMo 25.09. Currently it is not supported
Thanks! That means Sept of 2025? If so then that's too late. Could you please give some pointers regarding integrating PAI-Megaton-Patch's stage 3 pretraining feature into Nemo-Megatron, since PAI-Megaton-Patch has the stage 3 already? Thanks again!
Yes, Sept of 2025. We're not familiar with PAI-Megaton-Patch. You can ask in that repo for pointers.
We're working on better long context training support right now, but I don't have any near term ETA to share with you at this time.
Qwen3 MoE recipes can be found here: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/qwen3_30b_a3b.py#L55 https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes/qwen3_235b_a22b.py#L55
Thanks a lot!
Are the features of this architecture fully supported in Nemo-Megatron now, e.g. the Global Balancing Routing, etc.?
Moreover, do you have statistics about the token efficiency and throughput numbers for pretraining from scratch the Qwen3-30B-A3B architecture? Do you have test results? Esp. when it comes active expert size k=8, the compute requirements is 8-fold or not?
Thanks again!
I used:
enroot import docker://nvcr.io/nvidia/nemo:25.04,
enroot import docker://nvcr.io/nvidia/nemo:dev
they both don't have qwen3 recipes inside.
I then tried: enroot import docker://nvcr.io/nvidia/nemo:nightly enroot import docker://nvcr.io/nvidia/nemo:latest
But got 404 errors:
[INFO] Querying registry for permission grant [INFO] Authenticating with user: $oauthtoken [INFO] Using credentials from file: /home/mp/.config/enroot/.credentials [INFO] Authentication succeeded [INFO] Fetching image manifest list [INFO] Fetching image manifest [ERROR] URL https://nvcr.io/v2/nvidia/nemo/manifests/nightly returned error code: 404 Not Found
What's wrong?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.