Woosuk Kwon comments

Results 151 comments of


                                            Woosuk Kwon

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

And please note that enabling KV cache never affects your model outputs.

Support for MPT-7B and MPT-30B

Hi @mspronesti, thanks for brining it up. I believe MPT's architecture is pretty similar to BLOOM. We will add support for both models very soon!

[ROCm] Add support for Punica kernels on AMD GPUs

@hongxiayang @lcskrishna Could you help review this PR?

[ROCm] Add support for Punica kernels on AMD GPUs

@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by...

NCCL error

Hi @maxmelichov, in my experience, the error happened when using old version of pytorch. Please make sure to use `torch2.1.0+cu121` by running `pip install --upgrade torch`.

Add LLaVA support

Closed as we added support for LLaVA in #3042

[Catalog] GCP E2 instance is not supported

Isn't it a duplicate of #1006? I could launch spot E2, but not on-demand one.

[Catalog] GCP E2 instance is not supported

One possible solution is to set the `Price` values of those instances as null, so that they are filtered out.

[Catalog] GCP E2 instance is not supported

I don't think we should. Ok, then why don't just drop it from the catalog?

How integrate with hf with minial modification?

Thanks for your interest and great question! You can install vLLM [from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source) and directly modify the [model code](https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models).