tjoymeed
tjoymeed
Maybe because you used "lora" finetuning for MS-SWIFT but the VERL only supports "full" finetuning?
The problem is when I do --resume_from_checkpoint, all the parameters in the training sh script file are kept the same as the previous training runs, except only for the "--resume_from_checkpoint"....
anybody solved the problem?
The DAPO implementation in the recipe folder also has this problem. This brings down the credibility of the DAPO implementation in the recipe folder. So the original DAPO paper and...
What about MoE pretraining?
The Qwen 3 pretraining actually has 3 stages: 1. pretraining for 4096 context length; 2. pretraining for enhancing reasoning; 3. pretraining for extending to long context. So NeMo currently supports...
Okay, when will stage 3 pretraining be fully supported? Thanks!
> All Qwen 3 variants are supported, including 6 dense models and 2 MoE models. Could you please tell me where to find the recipe for Qwen3 MoE pretraining?
Great! Thanks a lot! Could you please enlighten me a bit on this: how to modify the recipe for qwen3_30b_a3b into pretraining from scratch a qwen3_12b-a1b? Thanks again!
> Okay, when will stage 3 pretraining be fully supported? Thanks! Is it coming out soon? Anxiously awaiting it... PAI-Megaton-Patch has the stage 3. Maybe there is a way to...