Woosuk Kwon comments

Results 151 comments of


                                            Woosuk Kwon

Speed between gptq w4a16 and awq w4a16?

Hi @frankxyy, vLLM does not support GPTQ at the moment. We are actively working for the support, so please stay tuned. Regarding your question, this is my understanding: While the...

[Feature] SYCL kernel support for Intel GPU

Hi @abhilash1910, thanks for submitting the great work! Can we chat about the PR? We'd like to know more about the background and your (team's) plan. If you're interested, please...

[Feature] SYCL kernel support for Intel GPU

Closing this PR as we merged #3814

Openapi style api document

Hi @jspisak, thanks for letting me know the issue! @xiaoToby Which CUDA and Python versions are you using? You can simply install vLLM by running `pip install vllm`. It will...

[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend

@jikunshang Just merged the PR! Thanks for your patience and all hard work despite the delays in the review process. This is amazing! BTW, I found that CI/CD was not...

[Model] Jamba support

QQ: Does this PR support parallel sampling (i.e., `n` > 1 in sampling params)? While I don't think it is not necessary to support parallel sampling in this PR, I'd...

File Limit Request: vllm - 400 MiB

Hi @cmaureir, I'm also a maintainer of vLLM. We do make our best effort to keep the binary size small, but it's increasingly difficult to meet the current limit since...

[Hardware][Intel] Add LoRA adapter support for CPU backend

@zhouyuan @bigPYJ1151 Could you please review this PR? Thanks!

Downloadable Package in PyPI

@yzh119 I see. What we need at the moment are the Python 3.8-3.11 wheels built for PyTorch 2.1.2 + CUDA 12.1. However, we do agree that maintaining compatibility between the...

Downloadable Package in PyPI

@yzh119 Also, do you mind if the vLLM team hosts specific PyTorch + CUDA versions of FlashInfer in PyPI under the name of `vllm-flashinfer-mirror` or something like that? This will...