mlx-vlm
mlx-vlm copied to clipboard
Question: Prompt tokens per second.
The prompt processing is quite slow. Around 4 tokens per second on my M2.
Is there a way to speed that up or is it related to the image processing/tokens?
Yes, you can use --resize-shape.
The image is probably too big or the model has architecture that has some bottlenecks (i.e, Llama-Vision cross attention)