Idefics2 VLM Support

Open NigelNelson opened this issue 1 year ago • 1 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x ] I carefully followed the README.md.
[x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Requesting support for HuggingFace's new Idefics2 VLM.

Motivation

First true open source VLM (Apache 2.0)
This 8B model offers comparable performance to Llava-1.6-34b and Apple's unreleased 30B MM1.
The HuggingFace team included a fine-tuning notebook which allows users to make specialized VLMs

Possible Implementation:

Hopefully this isn't exceedingly difficult to do. The base LLM is Mistral-7B-v0.1 and the image encoder is Google's siglip-so400m-patch14-384. Their projector is also a simple MLP similar to Llava.

Apr 16 '24 16:04 NigelNelson

This please 🙏

Apr 19 '24 01:04 ye7iaserag

Yes! And while someone is (hopefully) at it, maybe also add the new sota:

https://huggingface.co/TIGER-Lab/Mantis-8B-siglip-llama3

🙏

May 04 '24 20:05 Benjoyo

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 18 '24 01:06 github-actions[bot]