[Feature Request] Qwen3VL GRPO, SFT training
Additional context
Our customer would like to apply RL methods (GRPO, GSPO, and SPO) to VLM with MoE (such as Qwen3-VL).
Would it be possible to extend the current VLM support to Qwen3-VL?
(cc. @terrykong, @snowmanwwg )
@yfw please opine
updating the status here from @yfw : This seems like a large model so we will most likely need to use mcore path for this. We recently just merged this PR for VLM + mcore support for Qwen2.5-VL: https://github.com/NVIDIA-NeMo/RL/pull/1115 Supporting this new model in NeMo-RL will require adding Qwen3 VL in Megatron-Bridge. @yaoyu-33 is already working on this: https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/775. this is actually the main change. Once we have support for this model in Megatron-Bridge, we will just have to test it in NeMo-RL. assigning this task to @eagle705
Current issues about Qwen3-VL-30B-A3B (run_vlm_grpo.py)
- vllm version: https://github.com/vllm-project/vllm/issues/19793
- log
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker.VllmGenerationWorker/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m raise ValueError(
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m ValueError: The checkpoint you are trying to load has model type `qwen3_vl_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
- fast tokenizer
- log
Traceback (most recent call last):
File "/work/code/RL/examples/run_vlm_grpo.py", line 335, in main
processor = get_tokenizer(config["policy"]["tokenizer"], get_processor=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/nemo-rl/nemo_rl/algorithms/utils.py", line 264, in get_tokenizer
tokenizer = processor.tokenizer
^^^^^^^^^^^^^^^^^^^
File "/opt/nemo_rl_venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1099, in __getattr__
raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: Qwen2TokenizerFast has no attribute tokenizer. Did you mean: '_tokenizer'?