tjoymeed

Results 11 issues of tjoymeed

As you can see, after resuming from training, the [INFO:swift] line got disappeared with only blanks "--------------------" All I did was adding this line in the training script: --resume_from_checkpoint /myprojects/ms-swift/output/Qwen2.5-7B-32GPUs/v3-20250423-132415/checkpoint-400...

Hi all, I am doing Model Scope MS-SWIFT GRPO RL training with lora. When resume training from check-point, because I cannot directly do it due to the fact that my...

Hi all, I am using the latext MS-SWIFT GRPO LORA training and I run the training on 4x8=32 GPUs. And now I need to resume training on 2x8=16GPUs. But simply...

Hi all, I am looking for a button or a key shortcut which allows me to resume/continue execution for all processes at once after hitting breakpoints and stopping? (I am...

feature-request
debug

HI team, Thank you for the excellent work! Could you please tell me where to find example scripts/templates for pretraining from scratch a Qwen 2.5 7B-base model using Torchtitan? Thanks...

Hi team, Thanks for your excellent work! How to pretrain from scratch the Qwen 3 4B or 7B dense model (with my own data)? Architectural wise, they are the same...

Hi Team, Thanks a lot for your excellent work. How do I pretrain from scratch the Qwen3 4B model but with MoE idea borrowed from the much larger Qwen3-30B-A3B model?...

Is Qwen3 pretraining architectural features fully supported now? Hi Team, Thanks a lot for your excellent work! Is Qwen3 pretraining architectural features fully supported now? Could you please provide an...

HI Team, Thanks a lot for your excellent work! I have two separate questions: 1. Is there a way in VeRL to turn off Qwen3 32B thinking mode? 2. Is...

On the same node: First run - no problem: NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.9 driver version 575.51.03 with kernel driver version 570.133.20. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details....

bug
common
community-request