Cyrus Leung comments

Results 137 comments of


                                            Cyrus Leung

[Bug]: openapi running but "POST /v1/chat/completions HTTP/1.1" 404 Not Found

Can you copy the logs of the OpenAI-compatible server? Make sure you're using the correct address/port.

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt

#4688 might solve your issue, can you try that out?

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt

I'm currently investigating a similar issue in #4200. It seems that there is something wrong with the detokenizing logic where `new_decoded_token_text` gets pre-padded with extra whitespace characters. @Yard1 @njhill do...

[Doc] Add page for `PoolingParams`

@CatherineSue Currently, the purpose of `PoolingParams` is unclear since it has no required arguments. More detailed docs for `PoolingParams` would be greatly appreciated!

[Usage]: gpu memory usage when using tensor parallel

You can set the `--gpu-memory-utilization` parameter to a smaller value (default is 90% of each GPU)

[Draft][CI/Build] Optimize models tests

Using eager mode doesn't seem to lead to significant improvement. It seems that the bottleneck is in downloading the models, so we should parallelize this process.

[Draft][CI/Build] Optimize models tests

Tbh it is probably better if we have a way to avoid re-downloading the models each time. Any thoughts?

[Bug]: llava inference result is wrong !

At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197).

[Bug]: llava inference result is wrong !

> At this stage, you have to preprocess the image (using `LLavaProcessor` from HuggingFace) before feeding it into vLLM. Support for automatic image preprocessing is WIP (#4197). I miscapitalized the...

[Bug]: llava inference result is wrong !

> Additionally, the usage you mentioned in the llava_example is also not available The existing example was created without regard to image processing. You can use `AutoProcessor.from_pretrained(model_name)` to load the...