LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

Qwen3-VL-235B-A22B-thinking模型是否支持ppo训练? 我已经被折磨很久了。help!

Open Lixu2512 opened this issue 2 months ago • 1 comments

Reminder

  • [x] I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.4.dev0
  • Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
  • Python version: 3.12.7
  • PyTorch version: 2.7.1+cu128 (GPU)
  • Transformers version: 4.57.0
  • Datasets version: 3.6.0
  • Accelerate version: 1.10.1
  • PEFT version: 0.17.1
  • GPU type: NVIDIA H100 PCIe
  • GPU number: 8
  • GPU memory: 79.19GB
  • TRL version: 0.9.6
  • DeepSpeed version: 0.16.9
  • Bitsandbytes version: 0.46.1
  • vLLM version: 0.10.1.1
  • Default data directory: detected

Reproduction

Put your message here.

Others

请给我一个答复~

Lixu2512 avatar Oct 24 '25 09:10 Lixu2512

It is not recommend to use llamafactory to do ppo with moe-type model.

Kuangdd01 avatar Oct 24 '25 16:10 Kuangdd01