mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

InternVL3 Multi-Image Support Broken

Open tmoroney opened this issue 8 months ago • 1 comments

When I pass more than one image as input to either InternVL3-1B-4bit or InternVL3-2B-4bit I get the following error even though the same image array works with SmolVLM2-500M-Video-Instruct and llava-interleave-qwen-0.5b-4bit:

Files: [<PIL.Image.Image image mode=RGB size=1280x720 at 0x10F0C6690>, <PIL.Image.Image image mode=RGB size=1280x720 at 0x10F0B1820>, <PIL.Image.Image image mode=RGB size=1280x720 at 0x151178770>, <PIL.Image.Image image mode=RGB size=1280x720 at 0x1511787D0>] 

Prompt: User: <image>
<image>
<image>
<image>
Describe this video.
Assistant:

Warning: Failed to process inputs with error: list index out of range Trying to process inputs with return_tensors='pt'
Traceback (most recent call last):
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 821, in process_inputs_with_fallback
    inputs = process_inputs(
             ^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 813, in process_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/models/internvl_chat/processor.py", line 318, in __call__
    question = text[idx]
               ~~~~^^^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 830, in process_inputs_with_fallback
    inputs = process_inputs(processor, images, prompts, return_tensors="pt")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 813, in process_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/models/internvl_chat/processor.py", line 318, in __call__
    question = text[idx]
               ~~~~^^^^^
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/moroneyt/Documents/Vision-Testing/vlm-test.py", line 65, in <module>
    output = generate(model, processor, formatted_prompt, frames, verbose=False, max_tokens=100)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1208, in generate
    for response in stream_generate(model, processor, prompt, image, **kwargs):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1096, in stream_generate
    inputs = prepare_inputs(
             ^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 886, in prepare_inputs
    inputs = process_inputs_with_fallback(processor, images, prompts)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/moroneyt/Documents/Vision-Testing/venv/lib/python3.12/site-packages/mlx_vlm/utils.py", line 832, in process_inputs_with_fallback
    raise ValueError(
ValueError: Failed to process inputs with error: list index out of range. Please install PyTorch and try again.

tmoroney avatar Apr 23 '25 20:04 tmoroney