mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

Models to port to MLX-VLM

Open Blaizzy opened this issue 1 year ago • 53 comments

  • [x] MiniCPM-Llama3-V-2_5
  • [x] Florence 2
  • [x] Phi-3-vision
  • [x] Bunny
  • [x] Dolphi-vision-72b
  • [x] Llava Next
  • [x] Qwen2-VL
  • [x] Qwen2.5-VL
  • [x] Pixtral
  • [x] Llama-3.2
  • [x] Llava Interleave
  • [x] Idefics 3
  • [ ] OmniParser
  • [ ] Llava onevision
  • [ ] internlm-xcomposer2d5-7b
  • [ ] InternVL
  • [ ] CogVLM2
  • [ ] Copali
  • [ ] MoonDream2
  • [ ] Yi-VL
  • [ ] CuMo
  • [ ] Kosmos-2.5
  • [x] Molmo
  • [ ] Ovis Gemma
  • [ ] Aria
  • [ ] NVIDIA NVLM
  • [ ] GOT
  • [ ] InternVL 2.5

Instructions:

  1. Select the model and comment below with your selection
  2. Create a Draft PR titled: "Add support for X"
  3. Read Contribution guide
  4. Check existing models
  5. Tag @Blaizzy for code reviews and questions.

If the model you want is not listed, please suggest it and I will add it.

Blaizzy avatar Jun 11 '24 12:06 Blaizzy

Next release of Llava-Next

TODO: update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

Blaizzy avatar Jun 22 '24 15:06 Blaizzy

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2 I am now just reading the code, and trying to free some time for the conversion routine.

BoltzmannEntropy avatar Jul 31 '24 18:07 BoltzmannEntropy

https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

jrp2014 avatar Aug 08 '24 18:08 jrp2014

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

Blaizzy avatar Aug 08 '24 20:08 Blaizzy

MiniCPM-V v2.6

jrp2014 avatar Aug 27 '24 17:08 jrp2014

MiniCPM-V v2.6

jrp2014 avatar Aug 27 '24 17:08 jrp2014

Do you have a link to Florence-2?

s-smits avatar Sep 07 '24 10:09 s-smits

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

ChristianWeyer avatar Sep 10 '24 05:09 ChristianWeyer

Hey @ChristianWeyer Its mostly up-to-date, just missing qwen2-vl

Blaizzy avatar Sep 10 '24 12:09 Blaizzy

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

Blaizzy avatar Sep 10 '24 12:09 Blaizzy

[x] Phi-3-vision

Thanks! I guess Phi-3-vision includes 3.5?

ChristianWeyer avatar Sep 10 '24 13:09 ChristianWeyer

Yes, they have the same arch so there are no changes needed :)

Blaizzy avatar Sep 10 '24 13:09 Blaizzy

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

pulkitjindal88 avatar Sep 20 '24 15:09 pulkitjindal88

Qwen2-VL-72B would be amazing!

chigkim avatar Sep 21 '24 22:09 chigkim

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

simonw avatar Sep 29 '24 21:09 simonw

Yep they just merged Qwen2-vl support this weekend.

chigkim avatar Sep 30 '24 00:09 chigkim

Molmo please

xSNYPSx avatar Oct 02 '24 00:10 xSNYPSx

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

chigkim avatar Oct 02 '24 17:10 chigkim

Yap, that's a pretty awesome model! It's on my radar because we can run it in 4bit quant

Blaizzy avatar Oct 02 '24 19:10 Blaizzy

Pixtral-12B now has Base model. https://huggingface.co/mistralai/Pixtral-12B-Base-2409

chigkim avatar Oct 25 '24 20:10 chigkim

Hey @Blaizzy, could you add ColQwen support? As there already is qwen2-vl and ColQwen is just an additional linear layer on top this seems to be a low hanging fruit, also considering Col* is a really hot topic right now.

I could really use this for my projects (e.g. local private document search + qa) 😊

Benjoyo avatar Nov 22 '24 22:11 Benjoyo

Working on Idefics 3 here: https://github.com/Blaizzy/mlx-vlm/pull/124

pcuenca avatar Nov 26 '24 12:11 pcuenca

@Benjoyo, ColQwen and CoPali are awesome models.

At the moment, I'm going working on refactoring and some optimisations. New model ports by me are on hold.

However, I appreaciate any PRs. I'm here to review and help when needed.

Blaizzy avatar Nov 26 '24 14:11 Blaizzy

Thanky you very much, @pcuenca!

It means a lot 🚀

I left a few comments.

Blaizzy avatar Nov 26 '24 14:11 Blaizzy

is it possible to bring this under mlx-vlm

https://huggingface.co/showlab/ShowUI-2B

kukeshajanth avatar Nov 28 '24 03:11 kukeshajanth

Excited for InternVL-2.5... It beats QvQ on most benchmarks...

psm-2 avatar Dec 25 '24 02:12 psm-2

I'm excited as well :)

Blaizzy avatar Dec 25 '24 23:12 Blaizzy

Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6

Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy

psm-2 avatar Jan 14 '25 17:01 psm-2

Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6

Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy

+1

MiniCPM-o v2.6 is an Omni model which could be really useful.

qinxuye avatar Jan 21 '25 02:01 qinxuye

@Blaizzy Can you add an MLX-VLM function for creating visual models with any model by merging OpenGVLab/InternViT-300M-448px-V2_5 into any model?

psm-2 avatar Jan 22 '25 19:01 psm-2