Simon Mo
Simon Mo
@DarkLight1337 can you help answer the question since you recently touched the testing harness. Additionally, there might be other places we want override including tokenizer or generation config. Addressing those...
There are also some alternative implementation of this by moving this functionality to a special class of Worker or Executor, which can be configured when beam search is turned on...
@russellb @youkaichao can you please help final round of review?
We had to use Ubuntu 20 because of compatibility reason for wheel build. However, I believe it is possible to use 20 to build and 22 to test and openai...
OOO why dose `ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]` still work when we are using `uv` globally?
You might need the instruction tuned model instead of the base model: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1
Thank you for the PR > This PR will be very useful, if we want to sync weights between different vLLM instances with tensor parallel enabled, Is this used in...
@KuntaiDu would you have bandwidth to take a look at this?
@youkaichao @robertgshaw2-redhat would be great to get an understanding whether fits architecturally
cc @russellb if you think this will be useful.