LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Request of code or guideline to generate preference data using llava critic. (==reproduce llava-onevision-ov-chat)
Thank you for your amazing contributions and for sharing such an exciting project.
As I understand, the llava-onevision-qwen2-7b-ov-chat model is built upon the llava-onevision-qwen2-7b-ov model, with preference data generated by LLaVA-Critic during each iteration.
I found the script for DPO training [dpo_ov7b.sh] but I need additional code or guide to generate the preference data using LLaVA-Critic.
Is there any detailed guideline for reproducing the preference data for training the llava-onevision-qwen2-7b-ov-chat model?
Thank you so much, Jeehye
+1, @tyxiong23 , The llava-rlhf dataset is multi-turn form, how to choose question-image pairs to sample responses.