mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

Unrecognized image processor in mlx-community/Qwen2.5-VL-7B-Instruct-4bit

Open leoho0722 opened this issue 9 months ago • 6 comments

When I use the sample code from HuggingFace Model card

python -m mlx_vlm.generate --model mlx-community/Qwen2.5-VL-7B-Instruct-4bit --max-tokens 100 --temp 0.0 --prompt "Describe this image." --image <Path of the image>

I get the following error ( mlx-vlm.log )

This is a beta version of the video understanding. It may not work as expected.
Fetching 12 files: 100%|████████████████| 12/12 [00:00<00:00, 136400.13it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 156, in <module>
    main()
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 97, in main
    model, processor, config = get_model_and_processors(args.model, args.adapter_path)
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 86, in get_model_and_processors
    model, processor = load(
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/utils.py", line 279, in load
    processor = load_processor(model_path, True, **kwargs)
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/utils.py", line 342, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, **kwargs)
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 334, in from_pretrained
    return processor_class.from_pretrained(
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1070, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1116, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 569, in from_pretrained
    raise ValueError(
ValueError: Unrecognized image processor in /Users/leoho/.cache/huggingface/hub/models--mlx-community--Qwen2.5-VL-7B-Instruct-4bit/snapshots/75c400d442a81dc0c5ef90f095d8815112ecd350. Should have a `image_processor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: align, aria, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, clip, clipseg, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dinat, dinov2, donut-swin, dpt, efficientformer, efficientnet, flava, focalnet, fuyu, git, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, instructblipvideo, kosmos-2, layoutlmv2, layoutlmv3, levit, llava, llava_next, llava_next_video, llava_onevision, mask2former, maskformer, mgp-str, mllama, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nat, nougat, oneformer, owlv2, owlvit, paligemma, perceiver, pix2struct, pixtral, poolformer, pvt, pvt_v2, qwen2_5_vl, qwen2_vl, regnet, resnet, rt_detr, sam, segformer, seggpt, siglip, superglue, swiftformer, swin, swin2sr, swinv2, table-transformer, timesformer, timm_wrapper, tvlt, tvp, udop, upernet, van, videomae, vilt, vipllava, vit, vit_hybrid, vit_mae, vit_msn, vitmatte, xclip, yolos, zoedepth

How can I solve this problem?

Environment

Hardware

  • Apple MacBook Pro (13-inch, M2, 2022)
  • Apple M2 chips (8 cores CPU, 10 cores GPU)
  • 16GB RAM, 256GB SSD
  • macOS Sequoia 15.3.1

Software

leoho0722 avatar Feb 21 '25 16:02 leoho0722

Hello, I noticed the same behaviour today as well. Before that everything was working as expected. I think the problem is with the requirement transformers>=4.49.0.

JoeJoe1313 avatar Feb 22 '25 13:02 JoeJoe1313

FYI, https://github.com/huggingface/transformers/issues/36193 and https://github.com/Blaizzy/mlx-vlm/issues/211

JoeJoe1313 avatar Feb 22 '25 13:02 JoeJoe1313

Hey @JoeJoe1313 @leoho0722

This is an issue on the transformers side

It seems the Qwen2.5VLImageProcessor class was delete as it's identical with Qwen2VL.

The fix is to either:

  1. Change "image_processor_type": "Qwen2VLImageProcessor", in preprocessor_config.json
  2. Or wait till I update all models on the Hub

Blaizzy avatar Feb 24 '25 11:02 Blaizzy

Hey @JoeJoe1313 @leoho0722

This is an issue on the transformers side

It seems the Qwen2.5VLImageProcessor class was delete as it's identical with Qwen2VL.

The fix is to either:

  1. Change "image_processor_type": "Qwen2VLImageProcessor", in preprocessor_config.json
  2. Or wait till I update all models on the Hub

Hi, I missed your reply.

Thanks for your suggestion, I will try it.

leoho0722 avatar Mar 05 '25 05:03 leoho0722

This is how we preserved backwards compatibility in mlx-engine after the transformers change https://github.com/lmstudio-ai/mlx-engine/pull/115/commits/3f248540fb113e3c435dca2db2a67e5348280723

@Blaizzy feel free to upstream this if you feel like it would a valuable addition here.

neilmehta24 avatar Mar 06 '25 20:03 neilmehta24

Thanks @neilmehta24! It will definetly be.

Blaizzy avatar Mar 06 '25 23:03 Blaizzy