Woosuk Kwon
Woosuk Kwon
And please note that enabling KV cache never affects your model outputs.
Hi @mspronesti, thanks for brining it up. I believe MPT's architecture is pretty similar to BLOOM. We will add support for both models very soon!
@hongxiayang @lcskrishna Could you help review this PR?
@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by...
Hi @maxmelichov, in my experience, the error happened when using old version of pytorch. Please make sure to use `torch2.1.0+cu121` by running `pip install --upgrade torch`.
Closed as we added support for LLaVA in #3042
Isn't it a duplicate of #1006? I could launch spot E2, but not on-demand one.
One possible solution is to set the `Price` values of those instances as null, so that they are filtered out.
I don't think we should. Ok, then why don't just drop it from the catalog?
Thanks for your interest and great question! You can install vLLM [from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source) and directly modify the [model code](https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models).