Request of code or guideline to generate preference data using llava critic. (==reproduce llava-onevision-ov-chat)

Open naajeehxe opened this issue 1 year ago • 1 comments

Thank you for your amazing contributions and for sharing such an exciting project.

As I understand, the llava-onevision-qwen2-7b-ov-chat model is built upon the llava-onevision-qwen2-7b-ov model, with preference data generated by LLaVA-Critic during each iteration.

I found the script for DPO training [dpo_ov7b.sh] but I need additional code or guide to generate the preference data using LLaVA-Critic.

Is there any detailed guideline for reproducing the preference data for training the llava-onevision-qwen2-7b-ov-chat model?

Thank you so much, Jeehye

Jan 06 '25 10:01 naajeehxe

+1, @tyxiong23 , The llava-rlhf dataset is multi-turn form, how to choose question-image pairs to sample responses.

Jun 22 '25 15:06 zhang123434