MegatronModelMerger fails for VL models (Qwen3VL) with AttributeError: 'Qwen3VLConfig' object has no attribute 'num_hidden_layers'
System Info
- verl version: 0.6.1
- Platform: Linux
- Python version: 3.12
- PyTorch version: 2.5+
- Transformers version: 4.57.1
- CUDA version: 12.x
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Steps to reproduce the behavior:
- 使用 Megatron 后端训练 Qwen3-VL 模型(如 Qwen3-VL-8B)
- 训练完成后,调用 model merger 将 checkpoint 转换为 HuggingFace 格式:
python3 -m verl.model_merger merge
--backend megatron
--local_dir /path/to/checkpoints/global_step_8/actor
--target_dir /path/to/hf_model_final - 触发以下错误:
[rank0]: Traceback (most recent call last):
[rank0]: File "
", line 198, in _run_module_as_main [rank0]: File " ", line 88, in _run_code [rank0]: File "/path/to/verl/model_merger/main.py", line 73, in [rank0]: main() [rank0]: File "/path/to/verl/model_merger/main.py", line 68, in main [rank0]: merger.merge_and_save() [rank0]: File "/path/to/verl/model_merger/megatron_model_merger.py", line 496, in merge_and_save [rank0]: model_state_dict = self._load_state_dicts(model_ckpt_path) [rank0]: File "/path/to/verl/model_merger/megatron_model_merger.py", line 232, in _load_state_dicts [rank0]: self.pipeline_shards = get_dynamic_pipeline_shards(self.hf_config.num_hidden_layers, self.world_size) [rank0]: AttributeError: 'Qwen3VLConfig' object has no attribute 'num_hidden_layers
Root Cause: VL 模型配置(如 Qwen3VLConfig)的 num_hidden_layers 等属性位于 text_config 子配置中: { "model_type": "qwen3_vl", "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }
当前代码直接访问 self.hf_config.num_hidden_layers,对 VL 模型会失败。
Expected behavior
Megatron checkpoint 应该能够成功转换为 HuggingFace 格式,无论是标准 LLM 还是视觉语言模型(VL models)。 对于 VL 模型,MegatronModelMerger 应该正确处理嵌套的配置结构,从 text_config 中获取 num_hidden_layers 等属性。
+1
+1
+1,想问一下有什么临时的解决办法吗
突然发现3vl用megatron跑保存的checkpoint不用merge
请问解决了吗
我直接在外部加了一个k-v就成功了, 比如之前是 VL 模型配置(如 Qwen3VLConfig)的 num_hidden_layers 等属性位于 text_config 子配置中: { "model_type": "qwen3_vl", "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }
我修改为 VL 模型配置(如 Qwen3VLConfig)的 num_hidden_layers 等属性位于 text_config 子配置中: { "model_type": "qwen3_vl", "num_hidden_layers": 36, "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }