trl icon indicating copy to clipboard operation
trl copied to clipboard

Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model

Open fangkuoyu opened this issue 1 year ago • 6 comments
trafficstars

TRL SFTTrainer supports LLaVA (Large Language and Vision Assistant) as described in the following link Vision Language Models Explained

Is there any plan to release PPOTrainer and DPOTrainer for LLaVA? If not, could someone explain the concerns about implementing those trainers or suggest any alternatives? Thanks!

fangkuoyu avatar Jun 27 '24 23:06 fangkuoyu

@fangkuoyu #1647 supports VLLM training for DPO

kashif avatar Jun 28 '24 15:06 kashif

@kashif Thanks for your comment. Do you know the status of PPOTrainer for LLaVA or other similar models?

fangkuoyu avatar Jun 30 '24 06:06 fangkuoyu

For ref, part of answer here: https://github.com/huggingface/trl/pull/1647#issuecomment-2191711885

qgallouedec avatar Jun 30 '24 07:06 qgallouedec

I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.

qgallouedec avatar Jul 03 '24 15:07 qgallouedec

I'm looking for the same thing, something that supports all new VLM, like Phi-3-vision.

It would be needed not only for DPO but also SFT (the LLaVa example doesn't work out of the box with other models), also there aren't any examples of standardized dataset creations for VLMs question answering or vision language modeling at the moment.

DavidePaglieri avatar Jul 03 '24 17:07 DavidePaglieri

I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.

@qgallouedec Thanks for the update. I will try to run the example and provide some feedback.

fangkuoyu avatar Jul 04 '24 01:07 fangkuoyu

@qgallouedec Thanks for the great work! Any plans on supporting PPO for LLaVa and other vlms as well ?

just1nseo avatar Jul 08 '24 07:07 just1nseo

Any plans on supporting PPO for LLaVa and other vlms as well ?

LLava 1.5 is support LLava 1.6 will be supported PPO for VLM is not planned but we welcome contributions

qgallouedec avatar Jul 23 '24 10:07 qgallouedec

Is phi 3 vision currently supported?

psych0v0yager avatar Aug 07 '24 04:08 psych0v0yager

How about GRPO supporting?

lucasjinreal avatar Jan 31 '25 04:01 lucasjinreal