tjoymeed issues

Results 11 issues of


                                            tjoymeed

Bug! After resuming training, the info line doesn't show details anymore ...

As you can see, after resuming from training, the [INFO:swift] line got disappeared with only blanks "--------------------" All I did was adding this line in the training script: --resume_from_checkpoint /myprojects/ms-swift/output/Qwen2.5-7B-32GPUs/v3-20250423-132415/checkpoint-400...

Bug! Help! MS-SWIFT GRPO + LoRA training hung/stuck after training 1 step from full merged model merged from lora adapter

Hi all, I am doing Model Scope MS-SWIFT GRPO RL training with lora. When resume training from check-point, because I cannot directly do it due to the fact that my...

Bug! Checkpoint resume failure - deepspeed different DP size. Is there a quick checkpoint converter anywere?

Hi all, I am using the latext MS-SWIFT GRPO LORA training and I run the training on 4x8=32 GPUs. And now I need to resume training on 2x8=16GPUs. But simply...

Is there a way to resume/continue execution for all processes at once after hitting breakpoints and stopping?

Hi all, I am looking for a button or a key shortcut which allows me to resume/continue execution for all processes at once after hitting breakpoints and stopping? (I am...

feature-request

debug

How to pretrain from scratch a Qwen 2.5 7B-base model using Torchtitan?

HI team, Thank you for the excellent work! Could you please tell me where to find example scripts/templates for pretraining from scratch a Qwen 2.5 7B-base model using Torchtitan? Thanks...

How to pretrain from scratch the Qwen 3 4B or 7B dense model (with my own data)?

Hi team, Thanks for your excellent work! How to pretrain from scratch the Qwen 3 4B or 7B dense model (with my own data)? Architectural wise, they are the same...

How to pretrain from scratch the Qwen3 4B model but with MoE?

Hi Team, Thanks a lot for your excellent work. How do I pretrain from scratch the Qwen3 4B model but with MoE idea borrowed from the much larger Qwen3-30B-A3B model?...

Is Qwen3 pretraining architectural features fully supported now?

Is Qwen3 pretraining architectural features fully supported now? Hi Team, Thanks a lot for your excellent work! Is Qwen3 pretraining architectural features fully supported now? Could you please provide an...

Is there a way to turn off Qwen3 32B thinking mode and add a thinking budget?

HI Team, Thanks a lot for your excellent work! I have two separate questions: 1. Is there a way in VeRL to turn off Qwen3 32B thinking mode? 2. Is...

Running container image Nemo:25.07.gpt_oss twice consecutively on the same node would result in very different Compatibility behaviors

On the same node: First run - no problem: NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.9 driver version 575.51.03 with kernel driver version 570.133.20. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details....

bug

common

community-request