Failed to use fastvlm to inference due to arguments not match
I try to use the converted version under mlx-community to do inference but encounter the error as below.
Traceback (most recent call last):
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/scripts/recognition/eval.py", line 114, in <module>
dataset = dataset.map(_generate, desc="Generating Responses")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 562, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3341, in map
for rank, done, content in Dataset._map_single(**unprocessed_kwargs):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3673, in _map_single
for i, example in iter_outputs(shard_iterable):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3647, in iter_outputs
yield i, apply_function(example, i, offset=offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3570, in apply_function
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/scripts/recognition/eval.py", line 102, in _generate
response = generate(
^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/mlx_vlm/generate.py", line 539, in generate
for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/mlx_vlm/generate.py", line 429, in stream_generate
for n, (token, logprobs) in enumerate(
^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/mlx_vlm/generate.py", line 319, in generate_step
outputs = model(input_ids, pixel_values, cache=prompt_cache, mask=mask, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/mlx_vlm/models/fastvlm/fastvlm.py", line 166, in __call__
logits = self.language_model(
^^^^^^^^^^^^^^^^^^^^
File "/Users/hermeschen/Repo/work/taiwan-license-plate-recognition/.venv/lib/python3.12/site-packages/mlx_vlm/models/fastvlm/language.py", line 29, in __call__
out = self.model(inputs, None, cache, inputs_embeds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Qwen2Model.__call__() takes from 2 to 4 positional arguments but 5 were given
I'm working on macOS 26, and my environment is like
mlx 0.30.0
mlx-lm 0.28.3
mlx-metal 0.30.0
mlx-vlm 0.3.7
timm 1.0.22
torch 2.9.1
torchvision 0.24.1
transformers 4.57.1
Hey @hermeschen1116 Thanks for reporting it!
I'm gonna do a refactor because this is caused by mlx-lm changes
Thank you. Looking forward to the update. Also there's a shape mismatch problem in smolvlm2. Maybe it's the same case?
could you share the issue?
I'll post the error message later.
could you share the issue?
Hello, I got this when converting smolVLM2 in latest version.
Expected shape (512, 2048) but received shape (2048, 2048) for parameter language_model.layers.0.self_attn.k_proj.weight
Could you share a reproducible example?