tjoymeed
tjoymeed
As you can see, after resuming from training, the [INFO:swift] line got disappeared with only blanks "--------------------" All I did was adding this line in the training script: --resume_from_checkpoint /myprojects/ms-swift/output/Qwen2.5-7B-32GPUs/v3-20250423-132415/checkpoint-400...
Hi all, I am doing Model Scope MS-SWIFT GRPO RL training with lora. When resume training from check-point, because I cannot directly do it due to the fact that my...
Hi all, I am using the latext MS-SWIFT GRPO LORA training and I run the training on 4x8=32 GPUs. And now I need to resume training on 2x8=16GPUs. But simply...
Hi all, I am looking for a button or a key shortcut which allows me to resume/continue execution for all processes at once after hitting breakpoints and stopping? (I am...
HI team, Thank you for the excellent work! Could you please tell me where to find example scripts/templates for pretraining from scratch a Qwen 2.5 7B-base model using Torchtitan? Thanks...
Hi team, Thanks for your excellent work! How to pretrain from scratch the Qwen 3 4B or 7B dense model (with my own data)? Architectural wise, they are the same...
Hi Team, Thanks a lot for your excellent work. How do I pretrain from scratch the Qwen3 4B model but with MoE idea borrowed from the much larger Qwen3-30B-A3B model?...
Is Qwen3 pretraining architectural features fully supported now? Hi Team, Thanks a lot for your excellent work! Is Qwen3 pretraining architectural features fully supported now? Could you please provide an...
HI Team, Thanks a lot for your excellent work! I have two separate questions: 1. Is there a way in VeRL to turn off Qwen3 32B thinking mode? 2. Is...
On the same node: First run - no problem: NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.9 driver version 575.51.03 with kernel driver version 570.133.20. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details....