Simon Mo comments

Results 313 comments of


                                            Simon Mo

[Frontend][Core] passing hf_config args through openai server

@DarkLight1337 can you help answer the question since you recently touched the testing harness. Additionally, there might be other places we want override including tokenizer or generation config. Addressing those...

[RFC]: Reimplement and separate beam search on top of vLLM core

There are also some alternative implementation of this by moving this functionality to a special class of Worker or Executor, which can be configured when beam search is turned on...

[V1] DP scale-out (2/N): Decouple engine process management and comms

@russellb @youkaichao can you please help final round of review?

[CI/Build] upgrade Dockerfile to ubuntu 22.04

We had to use Ubuntu 20 because of compatibility reason for wheel build. However, I believe it is possible to use 20 to build and 22 to test and openai...

[WIP] Use uv python for docker

OOO why dose `ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]` still work when we are using `uv` globally?

[Misc]: Random Output Generation with mistralai/Mixtral-8x22B-v0.1

You might need the instruction tuned model instead of the base model: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1

[Core] Add support for loading weight that has already done TP sharding

Thank you for the PR > This PR will be very useful, if we want to sync weights between different vLLM instances with tensor parallel enabled, Is this used in...

[Bug]: with `--enable-prefix-caching` , `/completions` crashes server with `echo=True` above certain prompt length

@KuntaiDu would you have bandwidth to take a look at this?

[Neuron] Add Neuron device communicator for vLLM v1

@youkaichao @robertgshaw2-redhat would be great to get an understanding whether fits architecturally

[CI/Build]: Add Bandit security check to workflow

cc @russellb if you think this will be useful.