mlx-vlm lava-v1.6: unsupported operand type(s) for //: 'int' and 'NoneType'

These models used to work OK, but we now get:

mlx version: 0.22.0.dev20250110+1ce0c0fcb
mlx-vlm version: 0.1.10

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/llava-v1.6-34b-8bit
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 10828.12it/s]
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 12417.83it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: <|im_start|>user
<image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|im_end|>
<|im_start|>assistant

Failed to generate output for model at mlx-community/llava-v1.6-34b-8bit: unsupported operand type(s) for //: 'int' and 'NoneType'
********************************************************************************

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/llava-v1.6-mistral-7b-8bit
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 11227.22it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 26024.64it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: [INST] <image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily. [/INST]
Failed to generate output for model at mlx-community/llava-v1.6-mistral-7b-8bit: unsupported operand type(s) for //: 'int' and 'NoneType'
********************************************************************************

Jan 10 '25 21:01 jrp2014

I can't replicate this.

Could you provide the full traceback?

Jan 11 '25 21:01 Blaizzy

Not sure what further traceback I can offer. In my case, the smoke test produces the same thing.

Jan 11 '25 21:01 jrp2014

There is a full traceback to that error that you are not printing in your tests. I need it to undestand where the error you are getting is located.

I ran the smoke test with that model and it passed on my side: Screenshot 2025-01-11 at 11 21 44 PM

Jan 11 '25 22:01 Blaizzy

mlx version: 0.22.0.dev20250110+1ce0c0fcb

Please note that you continue to use an unoffical release of mlx

I would recommend you uninstall it and install the official and try again.

pip uninstall mlx
pip install -U mlx

Outside of that we are running the same version of mlx-vlm so it should work.

Jan 11 '25 22:01 Blaizzy

Let me know how if the error persists after you install the official release.

Jan 11 '25 22:01 Blaizzy

It may well be an mlx issue. I expect that there is some way to print the full stack trace to identify the culprit.

python smoke_test.py --models-file llava.txt --image /Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg
  0%|                                                                                             | 0/2 [00:00<?, ?it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/llava-v1.6-34b-8bit                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 10761.12it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 14312.16it/s]
✓ Model loaded successfully                                                                      | 0/17 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: <|im_start|>user
<image>
Describe this image.<|im_end|>
<|im_start|>assistant

✗ vision-language generation failed: unsupported operand type(s) for //: 'int' and 'NoneType'


Testing language-only generation...
==========
Image: None 

Prompt: <|im_start|>user
Hi, how are you?<|im_end|>
<|im_start|>assistant

Hello! I'm just a computer program, I don't have feelings or physical sensations, so I don't have a "how I am." But I'm here to help you with any questions or tasks you may have. How can I assist you today?
==========
Prompt: 15 tokens, 4.075 tokens-per-sec
Generation: 59 tokens, 11.798 tokens-per-sec
Peak memory: 36.997 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 50%|██████████████████████████████████████████▌                                          | 1/2 [00:20<00:20, 20.20s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/llava-v1.6-mistral-7b-8bit                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 19448.09it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 32896.50it/s]
✓ Model loaded successfully                                                                      | 0/12 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: [INST] <image>
Describe this image. [/INST]
✗ vision-language generation failed: unsupported operand type(s) for //: 'int' and 'NoneType'


Testing language-only generation...
==========
Image: None 

Prompt: [INST] Hi, how are you? [/INST]
Hello! I'm just a computer program, so I don't have feelings or emotions. Is there something I can help you with? 
==========
Prompt: 14 tokens, 75.599 tokens-per-sec
Generation: 31 tokens, 60.094 tokens-per-sec
Peak memory: 8.080 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

100%|█████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:23<00:00, 11.78s/it]


╭────────────────────────────────────────────────────── Results ───────────────────────────────────────────────────────╮
│ ✗ mlx-community/llava-v1.6-34b-8bit                                                                                  │
│ ✗ mlx-community/llava-v1.6-mistral-7b-8bit                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Some models tested failed to test


╭───────────────────────────────────────────────── System Information ─────────────────────────────────────────────────╮
│                                                                                                                      │
│ MAC OS:       v15.2                                                                                                  │
│ Python:       v3.12.7                                                                                                │
│ MLX:          v0.22.0                                                                                                │
│ MLX-VLM:      v0.1.10                                                                                                │
│ Transformers: v4.48.0                                                                                                │
│                                                                                                                      │
│ Hardware:                                                                                                            │
│ • Chip:       Apple M4 Max                                                                                           │
│ • RAM:        128.0 GB                                                                                               │
│ • CPU Cores:  16                                                                                                     │
│ • GPU Cores:  40                                                                                                     │
│                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Jan 11 '25 22:01 jrp2014

You will need to install mlx-vlm from source and change the test_smoke.py file. (I will make the necessary change to display full traceback on the next release)

Instead, you could:

run using mlx_vlm.generate in your terminal and get the full traceback.
or install mlx as I suggested earlier and see if it solves your issue.

Jan 11 '25 22:01 Blaizzy

I have done a fuller traceback, and it seems that there issue is in transformers. I do seem to recall that there was a warning, running earlier versions.

python smoke_test.py --models-file llava.txt --image /Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg
  0%|                                                                                             | 0/2 [00:00<?, ?it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/llava-v1.6-34b-8bit                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 11987.76it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 10993.40it/s]
✓ Model loaded successfully                                                                      | 0/17 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: <|im_start|>user
<image>
Describe this image.<|im_end|>
<|im_start|>assistant

✗ vision-language generation failed: unsupported operand type(s) for //: 'int' and 'NoneType'
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/scripts/vlm/smoke_test.py", line 110, in test_generation
    output = generate(**generate_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1101, in generate
    for response in stream_generate(model, processor, prompt, image, **kwargs):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1015, in stream_generate
    inputs = prepare_inputs(
             ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 812, in prepare_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/llava_next/processing_llava_next.py", line 162, in __call__
    num_image_tokens = self._get_number_of_features(orig_height, orig_width, height, width)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/llava_next/processing_llava_next.py", line 181, in _get_number_of_features
    patches_height = height // self.patch_size
                     ~~~~~~~^^~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'



Testing language-only generation...
==========
Image: None 

Prompt: <|im_start|>user
Hi, how are you?<|im_end|>
<|im_start|>assistant

Hello! I'm an AI language model, so I don't have feelings or personal experience, but I'm always ready to assist you with any questions you may have. How can I help?
==========
Prompt: 15 tokens, 5.109 tokens-per-sec
Generation: 43 tokens, 11.431 tokens-per-sec
Peak memory: 36.997 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 50%|██████████████████████████████████████████▌                                          | 1/2 [00:15<00:15, 15.92s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/llava-v1.6-mistral-7b-8bit                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 12 files: 100%|██████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 9843.86it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 27654.75it/s]
✓ Model loaded successfully                                                                      | 0/12 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211707_DSC01899.jpg'] 

Prompt: [INST] <image>
Describe this image. [/INST]
✗ vision-language generation failed: unsupported operand type(s) for //: 'int' and 'NoneType'
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/scripts/vlm/smoke_test.py", line 110, in test_generation
    output = generate(**generate_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1101, in generate
    for response in stream_generate(model, processor, prompt, image, **kwargs):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1015, in stream_generate
    inputs = prepare_inputs(
             ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 812, in prepare_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/llava_next/processing_llava_next.py", line 162, in __call__
    num_image_tokens = self._get_number_of_features(orig_height, orig_width, height, width)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/llava_next/processing_llava_next.py", line 181, in _get_number_of_features
    patches_height = height // self.patch_size
                     ~~~~~~~^^~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'

Jan 11 '25 22:01 jrp2014

I see, you probably running the latest transformers.

Let me try something to see if it fixes it.

Jan 11 '25 22:01 Blaizzy

Try again now :)

Jan 11 '25 23:01 Blaizzy

No change., I'm afraid. Looking back at earlier runs, I see that there was a deprecation warning:

`` Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.

Jan 11 '25 23:01 jrp2014

I just fixed that

Download a fresh copy of the model weights :)

Jan 12 '25 00:01 Blaizzy

https://huggingface.co/mlx-community/llava-v1.6-mistral-7b-8bit/commit/b8df5f329d95a7abe6429ed46093f9b84e8e6396

Jan 12 '25 00:01 Blaizzy

Thanks. That seems to fix mlx-community/llava-v1.6-mistral-7b-8bit. Is the problem the same for mlx-community/llava-v1.6-34b-8bit?

Jan 12 '25 21:01 jrp2014

My pleasure!

I think so

I will update all models.

Jan 12 '25 21:01 Blaizzy