verl icon indicating copy to clipboard operation
verl copied to clipboard

megatron save checkpoint bug

Open dtl123456 opened this issue 3 weeks ago • 0 comments

System Info

bug1:The qwen3-4B model saved in the hf format has only one weight file, missing one. Image bug2:There was an error when converting the Megatron format weights to HF.

Image

Information

  • [ ] The official example scripts
  • [x] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction

python -m verl.model_merger merge
--backend megatron
--local_dir xxx/checkpoints/RL/ckpts/FP16/DAPO-Qwen3-4B-megatron-bf16/global_step_200/actor
--target_dir /xxx/checkpoints/RL/ckpts/FP16/DAPO-Qwen3-4B-megatron-bf16/hf_200 \

Expected behavior

Save the correct and complete weights, and support the conversion from mcore format to HF format.

dtl123456 avatar Nov 28 '25 08:11 dtl123456