trl
trl copied to clipboard
Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model
TRL SFTTrainer supports LLaVA (Large Language and Vision Assistant) as described in the following link Vision Language Models Explained
Is there any plan to release PPOTrainer and DPOTrainer for LLaVA? If not, could someone explain the concerns about implementing those trainers or suggest any alternatives? Thanks!
@fangkuoyu #1647 supports VLLM training for DPO
@kashif Thanks for your comment. Do you know the status of PPOTrainer for LLaVA or other similar models?
For ref, part of answer here: https://github.com/huggingface/trl/pull/1647#issuecomment-2191711885
I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.
I'm looking for the same thing, something that supports all new VLM, like Phi-3-vision.
It would be needed not only for DPO but also SFT (the LLaVa example doesn't work out of the box with other models), also there aren't any examples of standardized dataset creations for VLMs question answering or vision language modeling at the moment.
I've added DPO support for Llava in #1797. This requires a few hacky tricks, especially as processor functions are not completely standard across VLMs. We'll use Idefics2 as a reference. I don't know if it makes sense to propose PR for merge, I'd be happy to hear opinions here or in the PR thread.
@qgallouedec Thanks for the update. I will try to run the example and provide some feedback.
@qgallouedec Thanks for the great work! Any plans on supporting PPO for LLaVa and other vlms as well ?
Any plans on supporting PPO for LLaVa and other vlms as well ?
LLava 1.5 is support LLava 1.6 will be supported PPO for VLM is not planned but we welcome contributions
Is phi 3 vision currently supported?
How about GRPO supporting?