mlx-vlm Which specific models work with this framework?

This is a nice framework to use for image analysis / captioning, etc.

Is there a doc somewhere that sets out which models, specifically can be driven through this app/library? When you say "Pixtral", eg, which of the versions should work (without further conversion, on what size of machine)?

I know that you say that Lava is no longer state of the art, but what is better?

Thanks.

Otherwise I get errors like

(mlx) ➜  mlx_vlm git:(main) ✗ python mytest.py
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6772.29it/s]
ERROR:root:Config file not found in /Users/jrp/.cache/huggingface/hub/models--mistralai--Pixtral-12B-2409/snapshots/df119bf36c0cedc6ffdc9ca6c58ebf51f9771ef7
Traceback (most recent call last):
  File "/Users/zzz/Documents/AI/mlx/mlx-vlm/mlx_vlm/mytest.py", line 12, in <module>
    model, processor = load(model_path)
                       ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 251, in load
    model = load_model(model_path, lazy)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 116, in load_model
    config = load_config(model_path)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 268, in load_config
    with open(model_path / "config.json", "r") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/Users/zzz/.cache/huggingface/hub/models--mistralai--Pixtral-12B-2409/snapshots/df119bf36c0cedc6ffdc9ca6c58ebf51f9771ef7/config.json'

Oct 11 '24 20:10 jrp2014

@jrp2014 good question!

In general you can find the correct models in the mlx-community repo. They are usually converted and uploaded there before the release.

We currently support the Pixtral version from the mistral-community. This version is formatted like llava.

https://huggingface.co/mistral-community/pixtral-12b

Oct 12 '24 14:10 Blaizzy

Thanks. I don't find the search function on hugging face particularly easy to use.

Oct 12 '24 17:10 jrp2014

Not sure what's going wrong here:

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mistral-community/pixtral-12b"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output)

results in

Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 28688.81it/s]
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/mytest3.py", line 8, in <module>
    model, processor = load(model_path)
                       ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 251, in load
    model = load_model(model_path, lazy)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 189, in load_model
    model = model_class.Model(model_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/llava.py", line 61, in __init__
    self.vision_tower = VisionModel(config.vision_config)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/vision.py", line 232, in __init__
    raise ValueError(f"Unsupported model type: {self.model_type}")
ValueError: Unsupported model type: pixtral

This is been run from the latest mlx_vlm directory

Oct 12 '24 18:10 jrp2014

Install from source.

I recently merged a PR fixing all the bugs

Oct 12 '24 18:10 Blaizzy

yes, that's what I am doing.

Oct 12 '24 18:10 jrp2014

pip install git+https://github.com/Blaizzy/mlx-vlm.git

Oct 12 '24 18:10 Blaizzy

Uninstall and reinstall from source.

It seems you have an older version.

Check the version you have installed.

Oct 12 '24 19:10 Blaizzy

Let me know if the issue persists with version 0.1.0

Oct 12 '24 19:10 Blaizzy

Is there a way of checking what version is being run from the python script?

Successfully built mlx-vlm
Installing collected packages: mlx-vlm
Successfully installed mlx-vlm-0.1.0

Fails as above.

Oct 12 '24 22:10 jrp2014

Try

pip list | grep mlx

Oct 12 '24 22:10 Blaizzy

Can you try to run this in your terminal

python -m mlx_vlm.generate --model mistral-community/pixtral-12b --max-tokens 100 --temp 0.0 --prompt
 "What animal is this?"

Oct 12 '24 22:10 Blaizzy

Still no, go, I'm afraid. No doubt it is something about my setup, but I can't see what it could be; it's built straight from a clone of your GitHub repository.

python -m mlx_vlm.generate --model mistral-community/pixtral-12b --max-tokens 100 --temp 0.0 --prompt 'What animal is this?'
Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 35226.52it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/generate.py", line 96, in <module>
    main()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/generate.py", line 73, in main
    model, processor, image_processor, config = get_model_and_processors(
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/generate.py", line 61, in get_model_and_processors
    model, processor = load(
                       ^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 251, in load
    model = load_model(model_path, lazy)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 189, in load_model
    model = model_class.Model(model_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/llava.py", line 61, in __init__
    self.vision_tower = VisionModel(config.vision_config)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/vision.py", line 232, in __init__
    raise ValueError(f"Unsupported model type: {self.model_type}")
ValueError: Unsupported model type: pixtral

Oct 12 '24 23:10 jrp2014

Please share the result of

pip list | grep mlx

Oct 13 '24 01:10 Blaizzy

lightning-whisper-mlx     0.0.10
mlx                       0.18.1.dev20241011+c21331d4
mlx-data                  0.0.2
mlx-lm                    0.19.1
mlx-vlm                   0.1.0
mlx-whisper               0.3.0

Oct 13 '24 09:10 jrp2014

Try this model and let me know if the issue persists.

mlx-community/pixtral-12b-8bit

Oct 13 '24 10:10 Blaizzy

Something doesn't add up because your logs are saying the model is loading using llava arch instead of pixtral.

Oct 13 '24 10:10 Blaizzy

I will give it a look.

Oct 13 '24 10:10 Blaizzy

Try this model and let me know if the issue persists.

mlx-community/pixtral-12b-8bit

Well this one doesn't just crash out, but it just spins, without producing an answer, either from the command line or via the script above.

python -m mlx_vlm.generate --model mlx-community/pixtral-12b-8bit --max-tokens 100 --temp 0.0

Fetching 11 files: 100%|█████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 44706.73it/s]
==========
Image: ['http://images.cocodataset.org/val2017/000000039769.jpg'] 

Prompt: <s>[INST]What are these?[IMG][/INST]

Oct 13 '24 12:10 jrp2014

Found the issue!

This version points to llava in the model config. I patched it locally.

Don't worry, I will add a condition to fix this at load time.

https://huggingface.co/mistral-community/pixtral-12b/blob/main/config.json

Oct 13 '24 12:10 Blaizzy

Well this one doesn't just crash out, but it just spins, without producing an answer, either from the command line or via the script above.

What are the specs of your machine?

Try to pass --resize-shape 128 128 or --resize-shape 224 224

Oct 13 '24 12:10 Blaizzy

Also try the 4bit version instead of the 8bit.

mlx-community/pixtral-12b-4bit

Oct 13 '24 12:10 Blaizzy

Found the issue! This version points to llava in the model config. I patched it locally. Don't worry, I will add a condition to fix this at load time.

On second thought, I don't think it's a good idea to add a condition for one model.

You can use all models already converted in mlx-community repo (4bit, 8bit and bf16). Otherwise, to use the mistral-community model, you just have to change the config.json model_type from llava to pixtral.

Oct 13 '24 14:10 Blaizzy

OK. Thanks. It'd be good to document some of these points up front as the connection between the model names used here and the various hugging face repositories is a little tenuous, for new users.

Oct 13 '24 17:10 jrp2014

Could you help me with that ?

Also, perhaps adding a way to scan for models on the mlx-community based on names ?

Oct 13 '24 18:10 Blaizzy

Sorry, but the models are too big for me to download and test comprehensively. I suggest that when you put up a new model type you give an example of the model that you used to test the addition. Also you could just point to the hugging face models that you have put up.

(My setup now seems to work again, staring from a fresh clone. Perhaps I shouldn't use iCloud to transfer my files between machines.)

But with the Mistral repo, which now has a config file, when replacing the model_type with llava, I still get

> python mytest.py
Fetching 15 files: 100%|█████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 15352.50it/s]
Traceback (most recent call last):
  File "/Users/xxx/Documents/AI/mlx/scripts/vlm/mytest.py", line 19, in <module>
    model, processor = load(model_path)
                       ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 251, in load
    model = load_model(model_path, lazy)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 189, in load_model
    model = model_class.Model(model_config)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/llava.py", line 61, in __init__
    self.vision_tower = VisionModel(config.vision_config)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/models/llava/vision.py", line 232, in __init__
    raise ValueError(f"Unsupported model type: {self.model_type}")
ValueError: Unsupported model type: llava

Oct 18 '24 20:10 jrp2014

Closing stale

Nov 10 '25 12:11 Blaizzy