RL icon indicating copy to clipboard operation
RL copied to clipboard

[Feature Request] Qwen3VL GRPO, SFT training

Open eagle705 opened this issue 2 months ago • 3 comments

Additional context

Our customer would like to apply RL methods (GRPO, GSPO, and SPO) to VLM with MoE (such as Qwen3-VL).

Would it be possible to extend the current VLM support to Qwen3-VL?

(cc. @terrykong, @snowmanwwg )

eagle705 avatar Oct 02 '25 06:10 eagle705

@yfw please opine

euronymous-aithal avatar Oct 02 '25 15:10 euronymous-aithal

updating the status here from @yfw : This seems like a large model so we will most likely need to use mcore path for this. We recently just merged this PR for VLM + mcore support for Qwen2.5-VL: https://github.com/NVIDIA-NeMo/RL/pull/1115 Supporting this new model in NeMo-RL will require adding Qwen3 VL in Megatron-Bridge. @yaoyu-33 is already working on this: https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/775. this is actually the main change. Once we have support for this model in Megatron-Bridge, we will just have to test it in NeMo-RL. assigning this task to @eagle705

euronymous-aithal avatar Oct 03 '25 17:10 euronymous-aithal

Current issues about Qwen3-VL-30B-A3B (run_vlm_grpo.py)

  • vllm version: https://github.com/vllm-project/vllm/issues/19793
    • log
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker.VllmGenerationWorker/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1273, in from_pretrained
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m     raise ValueError(
[36m(VllmGenerationWorker pid=3212593, ip=10.52.54.131)[0m ValueError: The checkpoint you are trying to load has model type `qwen3_vl_moe` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
  • fast tokenizer
    • log
Traceback (most recent call last):
  File "/work/code/RL/examples/run_vlm_grpo.py", line 335, in main
    processor = get_tokenizer(config["policy"]["tokenizer"], get_processor=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/nemo_rl/algorithms/utils.py", line 264, in get_tokenizer
    tokenizer = processor.tokenizer
                ^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo_rl_venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1099, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: Qwen2TokenizerFast has no attribute tokenizer. Did you mean: '_tokenizer'?

eagle705 avatar Nov 24 '25 08:11 eagle705