mlx-vlm
mlx-vlm copied to clipboard
Models to port to MLX-VLM
- [x] MiniCPM-Llama3-V-2_5
- [x] Florence 2
- [x] Phi-3-vision
- [x] Bunny
- [x] Dolphi-vision-72b
- [x] Llava Next
- [x] Qwen2-VL
- [x] Qwen2.5-VL
- [x] Pixtral
- [x] Llama-3.2
- [x] Llava Interleave
- [x] Idefics 3
- [ ] OmniParser
- [ ] Llava onevision
- [ ] internlm-xcomposer2d5-7b
- [ ] InternVL
- [ ] CogVLM2
- [ ] Copali
- [ ] MoonDream2
- [ ] Yi-VL
- [ ] CuMo
- [ ] Kosmos-2.5
- [x] Molmo
- [ ] Ovis Gemma
- [ ] Aria
- [ ] NVIDIA NVLM
- [ ] GOT
- [ ] InternVL 2.5
Instructions:
- Select the model and comment below with your selection
- Create a Draft PR titled: "Add support for X"
- Read Contribution guide
- Check existing models
- Tag @Blaizzy for code reviews and questions.
If the model you want is not listed, please suggest it and I will add it.
Next release of Llava-Next
TODO: update text config defaults to avoid errors with Llava-v1.6-vicuna:
class TextConfig:
model_type: str
hidden_size: int = 4096
num_hidden_layers: int = 32
intermediate_size: int = 11008
num_attention_heads: int = 32
rms_norm_eps: float = 1e-05
vocab_size: int = 32064
num_key_value_heads: int = 32
rope_theta: float = 1000000
rope_traditional: bool = False
rope_scaling: Optional[Dict[str, Union[float, str]]] = None
Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2 I am now just reading the code, and trying to free some time for the conversion routine.
https://llava-vl.github.io/blog/2024-08-05-llava-onevision/
Hey @BoltzmannEntropy and @jrp2014,
Thanks for the suggestions!
I have added them to the backlog
MiniCPM-V v2.6
MiniCPM-V v2.6
Do you have a link to Florence-2?
Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!
Hey @ChristianWeyer Its mostly up-to-date, just missing qwen2-vl
@s-smits here you go:
https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py
[x] Phi-3-vision
Thanks! I guess Phi-3-vision includes 3.5?
Yes, they have the same arch so there are no changes needed :)
Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.
Qwen2-VL-72B would be amazing!
This recipe seems to work for Qwen2-VL-2B-Instruct:
python -m mlx_vlm.generate \
--model Qwen/Qwen2-VL-2B-Instruct \
--max-tokens 100 \
--temp 0.0 \
--image django-roadmap.png \
--prompt "Describe image in detail, include all text"
My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17
Yep they just merged Qwen2-vl support this weekend.
Molmo please
Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.
https://huggingface.co/nvidia/NVLM-D-72B
Yap, that's a pretty awesome model! It's on my radar because we can run it in 4bit quant
Pixtral-12B now has Base model. https://huggingface.co/mistralai/Pixtral-12B-Base-2409
Hey @Blaizzy, could you add ColQwen support? As there already is qwen2-vl and ColQwen is just an additional linear layer on top this seems to be a low hanging fruit, also considering Col* is a really hot topic right now.
I could really use this for my projects (e.g. local private document search + qa) 😊
Working on Idefics 3 here: https://github.com/Blaizzy/mlx-vlm/pull/124
@Benjoyo, ColQwen and CoPali are awesome models.
At the moment, I'm going working on refactoring and some optimisations. New model ports by me are on hold.
However, I appreaciate any PRs. I'm here to review and help when needed.
Thanky you very much, @pcuenca!
It means a lot 🚀
I left a few comments.
is it possible to bring this under mlx-vlm
https://huggingface.co/showlab/ShowUI-2B
Excited for InternVL-2.5... It beats QvQ on most benchmarks...
I'm excited as well :)
Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6
Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy
Will MiniCPM-o-2_6 work? https://huggingface.co/openbmb/MiniCPM-o-2_6
Edit: ValueError: Model type minicpmo not supported. Can you support minicpmo? @Blaizzy
+1
MiniCPM-o v2.6 is an Omni model which could be really useful.
@Blaizzy Can you add an MLX-VLM function for creating visual models with any model by merging OpenGVLab/InternViT-300M-448px-V2_5 into any model?