Question: Prompt tokens per second.

Open stellanhaglund opened this issue 1 year ago • 1 comments

The prompt processing is quite slow. Around 4 tokens per second on my M2.

Is there a way to speed that up or is it related to the image processing/tokens?

Dec 06 '24 15:12 stellanhaglund

Yes, you can use --resize-shape.

The image is probably too big or the model has architecture that has some bottlenecks (i.e, Llama-Vision cross attention)

Dec 06 '24 15:12 Blaizzy