tjoymeed comments

Results 21 comments of


                                            tjoymeed

GRPO Training Speed Testing

Maybe because you used "lora" finetuning for MS-SWIFT but the VERL only supports "full" finetuning?

Bug! After resuming training, the info line doesn't show details anymore ...

The problem is when I do --resume_from_checkpoint, all the parameters in the training sh script file are kept the same as the previous training runs, except only for the "--resume_from_checkpoint"....

weird training result when switched from qwen2.5-7b-instruct to qwen3-8b, the accuracy is 0 with nonsense output

anybody solved the problem?

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32.

The DAPO implementation in the recipe folder also has this problem. This brings down the credibility of the DAPO implementation in the recipe folder. So the original DAPO paper and...

Is Qwen3 pretraining architectural features fully supported now?

What about MoE pretraining?

Is Qwen3 pretraining architectural features fully supported now?

The Qwen 3 pretraining actually has 3 stages: 1. pretraining for 4096 context length; 2. pretraining for enhancing reasoning; 3. pretraining for extending to long context. So NeMo currently supports...

Is Qwen3 pretraining architectural features fully supported now?

Okay, when will stage 3 pretraining be fully supported? Thanks!

Is Qwen3 pretraining architectural features fully supported now?

> All Qwen 3 variants are supported, including 6 dense models and 2 MoE models. Could you please tell me where to find the recipe for Qwen3 MoE pretraining?

Is Qwen3 pretraining architectural features fully supported now?

Great! Thanks a lot! Could you please enlighten me a bit on this: how to modify the recipe for qwen3_30b_a3b into pretraining from scratch a qwen3_12b-a1b? Thanks again!

Is Qwen3 pretraining architectural features fully supported now?

> Okay, when will stage 3 pretraining be fully supported? Thanks! Is it coming out soon? Anxiously awaiting it... PAI-Megaton-Patch has the stage 3. Maybe there is a way to...