[Feature] Support for RLHF Training Techniques (e.g. DPO)

Open fabriceyhc opened this issue 1 year ago • 0 comments

Motivation

Right now it appears that InternVL only supports SFT, but it would be helpful to expand on this with preference datasets. This would allow for an even more diverse fine-tuning and potentially higher performance in VL models.

Related resources

https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-training-approaches

Additional context

No response

Oct 02 '24 20:10 fabriceyhc