MiniCPM-V
MiniCPM-V copied to clipboard
In Technical Report
How did you proceed with DPO learning? Using CPMTrainer, or HF DPOTrainer? Does CPM Trainer support DPO finetuning?
We write the training code based on the RLAIF-V project. The code implement a trainer for DPO by itself.
thank you for your answer! i have 1 more question, Can I apply wsd scheduler in This repo's finetuning code?
Hello! By any chance, could you point me to the code you used for MiniCPM-V DPO learning? Thank you and greatly appreciate it.