LLaMA-Factory
LLaMA-Factory copied to clipboard
Qwen3-VL-235B-A22B-thinking模型是否支持ppo训练? 我已经被折磨很久了。help!
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
llamafactoryversion: 0.9.4.dev0- Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.39
- Python version: 3.12.7
- PyTorch version: 2.7.1+cu128 (GPU)
- Transformers version: 4.57.0
- Datasets version: 3.6.0
- Accelerate version: 1.10.1
- PEFT version: 0.17.1
- GPU type: NVIDIA H100 PCIe
- GPU number: 8
- GPU memory: 79.19GB
- TRL version: 0.9.6
- DeepSpeed version: 0.16.9
- Bitsandbytes version: 0.46.1
- vLLM version: 0.10.1.1
- Default data directory: detected
Reproduction
Put your message here.
Others
请给我一个答复~
It is not recommend to use llamafactory to do ppo with moe-type model.