verl
verl copied to clipboard
megatron save checkpoint bug
System Info
bug1:The qwen3-4B model saved in the hf format has only one weight file, missing one.
bug2:There was an error when converting the Megatron format weights to HF.
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below)
Reproduction
python -m verl.model_merger merge
--backend megatron
--local_dir xxx/checkpoints/RL/ckpts/FP16/DAPO-Qwen3-4B-megatron-bf16/global_step_200/actor
--target_dir /xxx/checkpoints/RL/ckpts/FP16/DAPO-Qwen3-4B-megatron-bf16/hf_200 \
Expected behavior
Save the correct and complete weights, and support the conversion from mcore format to HF format.