llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Idefics2 VLM Support

Open NigelNelson opened this issue 1 year ago • 1 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x ] I carefully followed the README.md.
  • [x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Requesting support for HuggingFace's new Idefics2 VLM.

Motivation

  • First true open source VLM (Apache 2.0)
  • This 8B model offers comparable performance to Llava-1.6-34b and Apple's unreleased 30B MM1.
  • The HuggingFace team included a fine-tuning notebook which allows users to make specialized VLMs

Possible Implementation:

Hopefully this isn't exceedingly difficult to do. The base LLM is Mistral-7B-v0.1 and the image encoder is Google's siglip-so400m-patch14-384. Their projector is also a simple MLP similar to Llava.

NigelNelson avatar Apr 16 '24 16:04 NigelNelson

This please 🙏

ye7iaserag avatar Apr 19 '24 01:04 ye7iaserag

Yes! And while someone is (hopefully) at it, maybe also add the new sota:

https://huggingface.co/TIGER-Lab/Mantis-8B-siglip-llama3

🙏

Benjoyo avatar May 04 '24 20:05 Benjoyo

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Jun 18 '24 01:06 github-actions[bot]