Shi Shuai comments

Results 9 comments of


                                            Shi Shuai

Adding Llava-Next (Llava 1.6) with full support.

I encountered a bug where most image inputs cause the model to crash with the following error: `RuntimeError: shape mismatch: value tensor of shape [2352, 7168] cannot be broadcast to...

Idefics tokenizer incorrectly accounts for image context

It seems that the current implementation counts the tokens generated from the encoded image as part of the prompt length. It might be better to extract the image features first...

Implement Vision Grid Transformer for Document Layout Analysis

I'm working on a similar project and am excited to see that you have already started. I'm curious about your progress. If needed, I can offer my help.

Llama-3 support

There is a mismatch between the tokenizer version used for training weights and the version used for loading? I am not sure if this is a problem with my weights....

Llava Next crashes on certain image sizes

Same problem `Method Prefill encountered an error`

Llava Next crashes on certain image sizes

It seems that the current implementation counts the tokens generated from the encoded image as part of the prompt length. It might be better to extract the image features first...

You need to re-install vllm and flash-attention-v2 `cd text-generation-inference/server rm -rf vllm make install-vllm-cuda rm -rf flash-attention-v2 make install-flash-attention-v2-cuda` They forgot to add this to the release notes about local...

Not able to install locally

> I have been installing all of the extensions via those commands for 2 days now; I also tried using the release v2.0.1 code zip let me try this once...