InternVL
InternVL copied to clipboard
[Feature] Support for RLHF Training Techniques (e.g. DPO)
Motivation
Right now it appears that InternVL only supports SFT, but it would be helpful to expand on this with preference datasets. This would allow for an even more diverse fine-tuning and potentially higher performance in VL models.
Related resources
https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-training-approaches
Additional context
No response