verl MegatronModelMerger fails for VL models (Qwen3VL) with AttributeError: 'Qwen3VLConfig' object has no attribute 'num_hidden

System Info

verl version: 0.6.1
Platform: Linux
Python version: 3.12
PyTorch version: 2.5+
Transformers version: 4.57.1
CUDA version: 12.x

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Steps to reproduce the behavior:

使用 Megatron 后端训练 Qwen3-VL 模型（如 Qwen3-VL-8B）
训练完成后，调用 model merger 将 checkpoint 转换为 HuggingFace 格式： python3 -m verl.model_merger merge
--backend megatron
--local_dir /path/to/checkpoints/global_step_8/actor
--target_dir /path/to/hf_model_final
触发以下错误： [rank0]: Traceback (most recent call last): [rank0]: File "", line 198, in _run_module_as_main [rank0]: File "", line 88, in _run_code [rank0]: File "/path/to/verl/model_merger/main.py", line 73, in [rank0]: main() [rank0]: File "/path/to/verl/model_merger/main.py", line 68, in main [rank0]: merger.merge_and_save() [rank0]: File "/path/to/verl/model_merger/megatron_model_merger.py", line 496, in merge_and_save [rank0]: model_state_dict = self._load_state_dicts(model_ckpt_path) [rank0]: File "/path/to/verl/model_merger/megatron_model_merger.py", line 232, in _load_state_dicts [rank0]: self.pipeline_shards = get_dynamic_pipeline_shards(self.hf_config.num_hidden_layers, self.world_size) [rank0]: AttributeError: 'Qwen3VLConfig' object has no attribute 'num_hidden_layers

Root Cause: VL 模型配置（如 Qwen3VLConfig）的 num_hidden_layers 等属性位于 text_config 子配置中： { "model_type": "qwen3_vl", "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }

当前代码直接访问 self.hf_config.num_hidden_layers，对 VL 模型会失败。

Expected behavior

Megatron checkpoint 应该能够成功转换为 HuggingFace 格式，无论是标准 LLM 还是视觉语言模型（VL models）。对于 VL 模型，MegatronModelMerger 应该正确处理嵌套的配置结构，从 text_config 中获取 num_hidden_layers 等属性。

Dec 01 '25 02:12 Eisenhower

+1

Dec 01 '25 12:12 mangozz2019

+1

Dec 04 '25 08:12 yushaohan

+1，想问一下有什么临时的解决办法吗

Dec 04 '25 17:12 FangXinyu-0913

突然发现3vl用megatron跑保存的checkpoint不用merge

Dec 09 '25 08:12 yushaohan

请问解决了吗

Dec 20 '25 03:12 whu125

我直接在外部加了一个k-v就成功了，比如之前是 VL 模型配置（如 Qwen3VLConfig）的 num_hidden_layers 等属性位于 text_config 子配置中： { "model_type": "qwen3_vl", "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }

我修改为 VL 模型配置（如 Qwen3VLConfig）的 num_hidden_layers 等属性位于 text_config 子配置中： { "model_type": "qwen3_vl", "num_hidden_layers": 36, "text_config": { "num_hidden_layers": 36, "num_attention_heads": 32, "num_key_value_heads": 8 }, "vision_config": { ... } }

Dec 22 '25 06:12 WjzZwd

MegatronModelMerger fails for VL models (Qwen3VL) with AttributeError: 'Qwen3VLConfig' object has no attribute 'num_hidden_layers'

System Info

Information

Tasks

Reproduction

Expected behavior