mlx-vlm Models to port to MLX-VLM

[x] MiniCPM-Llama3-V-2_5
[x] Florence 2
[x] Phi-3-vision
[x] Bunny
[x] Dolphi-vision-72b
[x] Llava Next
[x] Qwen2-VL
[x] Qwen2.5-VL
[x] Pixtral
[x] Llama-3.2
[x] Llava Interleave
[x] Idefics 3
[ ] OmniParser
[ ] Llava onevision
[ ] internlm-xcomposer2d5-7b
[ ] InternVL
[ ] CogVLM2
[ ] Copali
[ ] MoonDream2
[ ] Yi-VL
[ ] CuMo
[ ] Kosmos-2.5
[x] Molmo
[ ] Ovis Gemma
[ ] Aria
[ ] NVIDIA NVLM
[ ] GOT
[ ] InternVL 2.5

Instructions:

Select the model and comment below with your selection
Create a Draft PR titled: "Add support for X"
Read Contribution guide
Check existing models
Tag @Blaizzy for code reviews and questions.

If the model you want is not listed, please suggest it and I will add it.

Jun 11 '24 12:06 Blaizzy

Next release of Llava-Next

TODO: update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

Jun 22 '24 15:06 Blaizzy

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2 I am now just reading the code, and trying to free some time for the conversion routine.

Jul 31 '24 18:07 BoltzmannEntropy

https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

Aug 08 '24 18:08 jrp2014

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

Aug 08 '24 20:08 Blaizzy

MiniCPM-V v2.6

Aug 27 '24 17:08 jrp2014

MiniCPM-V v2.6

Aug 27 '24 17:08 jrp2014

Do you have a link to Florence-2?

Sep 07 '24 10:09 s-smits

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

Sep 10 '24 05:09 ChristianWeyer

Hey @ChristianWeyer Its mostly up-to-date, just missing qwen2-vl

Sep 10 '24 12:09 Blaizzy

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

Sep 10 '24 12:09 Blaizzy

[x] Phi-3-vision

Thanks! I guess Phi-3-vision includes 3.5?

Sep 10 '24 13:09 ChristianWeyer

Yes, they have the same arch so there are no changes needed :)

Sep 10 '24 13:09 Blaizzy

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

Sep 20 '24 15:09 pulkitjindal88

Qwen2-VL-72B would be amazing!

Sep 21 '24 22:09 chigkim

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

Sep 29 '24 21:09 simonw

Yep they just merged Qwen2-vl support this weekend.

Sep 30 '24 00:09 chigkim

Molmo please

Oct 02 '24 00:10 xSNYPSx

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

Oct 02 '24 17:10 chigkim

Yap, that's a pretty awesome model! It's on my radar because we can run it in 4bit quant

Oct 02 '24 19:10 Blaizzy

Pixtral-12B now has Base model. https://huggingface.co/mistralai/Pixtral-12B-Base-2409

Oct 25 '24 20:10 chigkim

Hey @Blaizzy, could you add ColQwen support? As there already is qwen2-vl and ColQwen is just an additional linear layer on top this seems to be a low hanging fruit, also considering Col* is a really hot topic right now.

I could really use this for my projects (e.g. local private document search + qa) 😊

Nov 22 '24 22:11 Benjoyo

Working on Idefics 3 here: https://github.com/Blaizzy/mlx-vlm/pull/124

Nov 26 '24 12:11 pcuenca

@Benjoyo, ColQwen and CoPali are awesome models.

At the moment, I'm going working on refactoring and some optimisations. New model ports by me are on hold.

However, I appreaciate any PRs. I'm here to review and help when needed.

Nov 26 '24 14:11 Blaizzy

Thanky you very much, @pcuenca!

It means a lot 🚀

I left a few comments.

Nov 26 '24 14:11 Blaizzy

is it possible to bring this under mlx-vlm

https://huggingface.co/showlab/ShowUI-2B

Nov 28 '24 03:11 kukeshajanth

Excited for InternVL-2.5... It beats QvQ on most benchmarks...

Dec 25 '24 02:12 psm-2

I'm excited as well :)

Dec 25 '24 23:12 Blaizzy

Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6

Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy

Jan 14 '25 17:01 psm-2

Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6

Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy

+1

MiniCPM-o v2.6 is an Omni model which could be really useful.

Jan 21 '25 02:01 qinxuye

@Blaizzy Can you add an MLX-VLM function for creating visual models with any model by merging OpenGVLab/InternViT-300M-448px-V2_5 into any model?

Jan 22 '25 19:01 psm-2

mlx-vlm mlx-vlm copied to clipboard

Models to port to MLX-VLM

mlx-vlm
mlx-vlm copied to clipboard