LLaMA-Factory

LLaMA-Factory copied to clipboard

Published 1 month ago •

Reame
Issues

Qwen3-VL-235B-A22B-thinking模型是否支持ppo训练？我已经被折磨很久了。help！

Open Lixu2512 opened this issue 2 months ago • 1 comments

Reminder

[x] I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.4.dev0
Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
Python version: 3.12.7
PyTorch version: 2.7.1+cu128 (GPU)
Transformers version: 4.57.0
Datasets version: 3.6.0
Accelerate version: 1.10.1
PEFT version: 0.17.1
GPU type: NVIDIA H100 PCIe
GPU number: 8
GPU memory: 79.19GB
TRL version: 0.9.6
DeepSpeed version: 0.16.9
Bitsandbytes version: 0.46.1
vLLM version: 0.10.1.1
Default data directory: detected

Reproduction

Put your message here.

Others

请给我一个答复~

Oct 24 '25 09:10 Lixu2512

It is not recommend to use llamafactory to do ppo with moe-type model.

Oct 24 '25 16:10 Kuangdd01