trl Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model

Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model

Open fangkuoyu opened this issue 1 year ago • 6 comments

trafficstars

TRL SFTTrainer supports LLaVA (Large Language and Vision Assistant) as described in the following link Vision Language Models Explained

Is there any plan to release PPOTrainer and DPOTrainer for LLaVA? If not, could someone explain the concerns about implementing those trainers or suggest any alternatives? Thanks!

Jun 27 '24 23:06 fangkuoyu

@fangkuoyu #1647 supports VLLM training for DPO

Jun 28 '24 15:06 kashif

@kashif Thanks for your comment. Do you know the status of PPOTrainer for LLaVA or other similar models?

Jun 30 '24 06:06 fangkuoyu

For ref, part of answer here: https://github.com/huggingface/trl/pull/1647#issuecomment-2191711885

Jun 30 '24 07:06 qgallouedec

I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.

Jul 03 '24 15:07 qgallouedec

I'm looking for the same thing, something that supports all new VLM, like Phi-3-vision.

It would be needed not only for DPO but also SFT (the LLaVa example doesn't work out of the box with other models), also there aren't any examples of standardized dataset creations for VLMs question answering or vision language modeling at the moment.

Jul 03 '24 17:07 DavidePaglieri

I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.

@qgallouedec Thanks for the update. I will try to run the example and provide some feedback.

Jul 04 '24 01:07 fangkuoyu

@qgallouedec Thanks for the great work! Any plans on supporting PPO for LLaVa and other vlms as well ?