llama.cpp
llama.cpp copied to clipboard
Idefics2 VLM Support
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x ] I carefully followed the README.md.
- [x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
Requesting support for HuggingFace's new Idefics2 VLM.
Motivation
- First true open source VLM (Apache 2.0)
- This 8B model offers comparable performance to Llava-1.6-34b and Apple's unreleased 30B MM1.
- The HuggingFace team included a fine-tuning notebook which allows users to make specialized VLMs
Possible Implementation:
Hopefully this isn't exceedingly difficult to do. The base LLM is Mistral-7B-v0.1 and the image encoder is Google's siglip-so400m-patch14-384. Their projector is also a simple MLP similar to Llava.
This please 🙏
Yes! And while someone is (hopefully) at it, maybe also add the new sota:
https://huggingface.co/TIGER-Lab/Mantis-8B-siglip-llama3
🙏
This issue was closed because it has been inactive for 14 days since being marked as stale.