mlx-vlm
mlx-vlm copied to clipboard
Unrecognized image processor in mlx-community/Qwen2.5-VL-7B-Instruct-4bit
When I use the sample code from HuggingFace Model card
python -m mlx_vlm.generate --model mlx-community/Qwen2.5-VL-7B-Instruct-4bit --max-tokens 100 --temp 0.0 --prompt "Describe this image." --image <Path of the image>
I get the following error ( mlx-vlm.log )
This is a beta version of the video understanding. It may not work as expected.
Fetching 12 files: 100%|████████████████| 12/12 [00:00<00:00, 136400.13it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/[email protected]/3.10.16/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 156, in <module>
main()
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 97, in main
model, processor, config = get_model_and_processors(args.model, args.adapter_path)
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/generate.py", line 86, in get_model_and_processors
model, processor = load(
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/utils.py", line 279, in load
processor = load_processor(model_path, True, **kwargs)
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/mlx_vlm/utils.py", line 342, in load_processor
processor = AutoProcessor.from_pretrained(model_path, **kwargs)
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 334, in from_pretrained
return processor_class.from_pretrained(
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1070, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1116, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/Users/leoho/Desktop/mlx-llm-example/.venv/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 569, in from_pretrained
raise ValueError(
ValueError: Unrecognized image processor in /Users/leoho/.cache/huggingface/hub/models--mlx-community--Qwen2.5-VL-7B-Instruct-4bit/snapshots/75c400d442a81dc0c5ef90f095d8815112ecd350. Should have a `image_processor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: align, aria, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, clip, clipseg, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dinat, dinov2, donut-swin, dpt, efficientformer, efficientnet, flava, focalnet, fuyu, git, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, instructblipvideo, kosmos-2, layoutlmv2, layoutlmv3, levit, llava, llava_next, llava_next_video, llava_onevision, mask2former, maskformer, mgp-str, mllama, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nat, nougat, oneformer, owlv2, owlvit, paligemma, perceiver, pix2struct, pixtral, poolformer, pvt, pvt_v2, qwen2_5_vl, qwen2_vl, regnet, resnet, rt_detr, sam, segformer, seggpt, siglip, superglue, swiftformer, swin, swin2sr, swinv2, table-transformer, timesformer, timm_wrapper, tvlt, tvp, udop, upernet, van, videomae, vilt, vipllava, vit, vit_hybrid, vit_mae, vit_msn, vitmatte, xclip, yolos, zoedepth
How can I solve this problem?
Environment
Hardware
- Apple MacBook Pro (13-inch, M2, 2022)
- Apple M2 chips (8 cores CPU, 10 cores GPU)
- 16GB RAM, 256GB SSD
- macOS Sequoia 15.3.1
Software
- Python 3.10.16
- mlx 0.22.1
- mlx-lm 0.21.4
- mlx-vlm 0.1.14 requirements.txt
Hello, I noticed the same behaviour today as well. Before that everything was working as expected. I think the problem is with the requirement transformers>=4.49.0.
FYI, https://github.com/huggingface/transformers/issues/36193 and https://github.com/Blaizzy/mlx-vlm/issues/211
Hey @JoeJoe1313 @leoho0722
This is an issue on the transformers side
It seems the Qwen2.5VLImageProcessor class was delete as it's identical with Qwen2VL.
The fix is to either:
- Change "image_processor_type": "Qwen2VLImageProcessor", in preprocessor_config.json
- Or wait till I update all models on the Hub
This is an issue on the transformers side
It seems the Qwen2.5VLImageProcessor class was delete as it's identical with Qwen2VL.
The fix is to either:
- Change "image_processor_type": "Qwen2VLImageProcessor", in preprocessor_config.json
- Or wait till I update all models on the Hub
Hi, I missed your reply.
Thanks for your suggestion, I will try it.
This is how we preserved backwards compatibility in mlx-engine after the transformers change https://github.com/lmstudio-ai/mlx-engine/pull/115/commits/3f248540fb113e3c435dca2db2a67e5348280723
@Blaizzy feel free to upstream this if you feel like it would a valuable addition here.
Thanks @neilmehta24! It will definetly be.