Simon Mo comments

Results 313 comments of


                                            Simon Mo

Add control panel allow manage multi vllm instances

Can you open an RFC for this for design discussion?

Support to serve vLLM on Kubernetes with LWS

Please add this to https://github.com/vllm-project/vllm/blob/main/docs/source/serving/integrations.rst?plain=1 to it's included in docs.

[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests

@Alexei-V-Ivanov-AMD please ping me on Slack when this is ready to go. Thank you!

[Bug]: Async engine hangs with 0.4.* releases

When you initialize the async engine I think it expects to be in an running event loop, not sure why though. If you change the code to ```diff diff --git...

[Misc] Logits processor plugins

@mmoskal @noamgat @br3no curious about your feedback on this!

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)

Merged.

[Core] Implement sharded state loader

One question I have is that can this be implemented using safetensor's partial read? safetensors have all the metadata in headers so you can access the tensors partially

[Frontend] OpenAI API server: Do not add bos token by default when encoding

@DarkLight1337 can you help take another look and let me know whether this is mergable?

Understanding about LLM class from vllm

You can use AsyncLLMEngine to call it asynchronously.

vLLM model serving server hangs when GPU KV cache usage reaches 10%

I believe this is similar to #1879. While T4 can run a 7B model, the throughput will be very very low and vLLM will likely perform a lot of eviction...