Results 9 comments of Shi Shuai

I encountered a bug where most image inputs cause the model to crash with the following error: `RuntimeError: shape mismatch: value tensor of shape [2352, 7168] cannot be broadcast to...

It seems that the current implementation counts the tokens generated from the encoded image as part of the prompt length. It might be better to extract the image features first...

I'm working on a similar project and am excited to see that you have already started. I'm curious about your progress. If needed, I can offer my help.

There is a mismatch between the tokenizer version used for training weights and the version used for loading? I am not sure if this is a problem with my weights....

Same problem `Method Prefill encountered an error`

It seems that the current implementation counts the tokens generated from the encoded image as part of the prompt length. It might be better to extract the image features first...

You need to re-install vllm and flash-attention-v2 `cd text-generation-inference/server rm -rf vllm make install-vllm-cuda rm -rf flash-attention-v2 make install-flash-attention-v2-cuda` They forgot to add this to the release notes about local...

> I have been installing all of the extensions via those commands for 2 days now; I also tried using the release v2.0.1 code zip let me try this once...