AmazDeng

Results 49 comments of AmazDeng

> At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197). Could you provide...

> > At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197). > >...

> Make sure that you're using consistent sampling parameters (e.g. temperature, greedy decoding) for the two implementations. The most simple way is to set `temperature=0` so that the result will...

> Looking into your code further, it seems that you are using different prompts. In your HuggingFace test, you placed the image before the question, but in the vLLM test,...

> > > Looking into your code further, it seems that you are using different prompts. In your HuggingFace test, you placed the image before the question, but in the...

> > Directly inputting the Hugging Face prompt into VLLM won't work; it will throw an error.It seems that I must construct the prompt according to the example provided by...

> In `prompt2` you have an extra whitespace in the vLLM prompt after `"desc the image in detail"`. Perhaps that affected the result. The image descriptions generated by HF and...

> > The image descriptions generated by HF and VLLM for prompt 2 are the same, but they differ for prompt 1. Please see the results below. I removed the...

> > Additionally, I'd like to inquire whether LLAVA in VLLM can accept text-only input instead of images. This way, I could leverage LLAVA's language modeling capabilities. I recall that...

> > Is this the correct usage pattern? > > Yes, just use it like a regular text-only LLM. Okay. Thank you for your continuous support and assistance. Best wishes...