Simon Mo comments

Results 313 comments of


                                            Simon Mo

[RFC]: Deprecate stop_reason in OpenAI Entrypoint in favor of finish_reason; fix implementation of finish_reason

Adding @njhill who initially added this.

[Doc] More neutral K8s deployment guide

Here's my recommendation: * We can sort and add the github organization prefix so we have fully qualified names such as `vllm-project/production-stack`, `vllm-project/aibrix`, and `kubernetes-sigs/lws`, etc.

[Usage]: How to increase the context length when start with vllm.entrypoints.openai.api_server

The error is saying that the amount of memory available will not be able to handle such large context.

Track expert selection metrics

We just merged EPLB from @abmfy (cc @WoosukKwon). Please rebase and we would love to expose this core metrics!

[Bug]: When using the latest 0.6.3, No module named 'vllm._version' appears

I will release a new version once this is fixed... @dtrifiro I think this is due to the fact I hard coded version in `setup.py`'s so no _version.py is generated....

[Bug]: When using the latest 0.6.3, No module named 'vllm._version' appears

I will make a patch release once https://github.com/vllm-project/vllm/pull/9375 merged

[Bug]: llama 4 scout instruct does not support torch.compile

Because the weights themselves aren't fp8 quantized yet, Scout can only run with bf16 weights. However, we do support dynamic quantization of the KV cache via `--kv-cache-dtype fp8`. Team from...

[V1] Support bad_words in sampler

Test failed https://buildkite.com/vllm/ci/builds/14971/canvas?sid=01957347-78b6-407a-921f-c7a82847a7ed#01957347-7a52-4f3d-972b-78decfdc6577/206-12701

[Roadmap] vLLM Roadmap Q3 2024

> an existing API with a batch request like you do with the OpenAI Batch API. @w013nad (or others), please feel free to open an RFC for this to discuss...

[Roadmap] vLLM Roadmap Q4 2024

@sylviayangyy @zeroorhero thank you for your interests! Yes. @KuntaiDu has created a #feat-kvcache-offloading to discuss that.