Cyrus Leung

Results 137 comments of Cyrus Leung

Can you copy the logs of the OpenAI-compatible server? Make sure you're using the correct address/port.

#4688 might solve your issue, can you try that out?

I'm currently investigating a similar issue in #4200. It seems that there is something wrong with the detokenizing logic where `new_decoded_token_text` gets pre-padded with extra whitespace characters. @Yard1 @njhill do...

@CatherineSue Currently, the purpose of `PoolingParams` is unclear since it has no required arguments. More detailed docs for `PoolingParams` would be greatly appreciated!

You can set the `--gpu-memory-utilization` parameter to a smaller value (default is 90% of each GPU)

Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process.

Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts?

At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197).

> At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197). I miscapitalized the...

> Additionally, the usage you mentioned in the llava_example is also not available The existing example was created without regard to image processing. You can use `AutoProcessor.from_pretrained(model_name)` to load the...