Woosuk Kwon

Results 151 comments of Woosuk Kwon

Hi @frankxyy, vLLM does not support GPTQ at the moment. We are actively working for the support, so please stay tuned. Regarding your question, this is my understanding: While the...

Hi @abhilash1910, thanks for submitting the great work! Can we chat about the PR? We'd like to know more about the background and your (team's) plan. If you're interested, please...

Closing this PR as we merged #3814

Hi @jspisak, thanks for letting me know the issue! @xiaoToby Which CUDA and Python versions are you using? You can simply install vLLM by running `pip install vllm`. It will...

@jikunshang Just merged the PR! Thanks for your patience and all hard work despite the delays in the review process. This is amazing! BTW, I found that CI/CD was not...

QQ: Does this PR support parallel sampling (i.e., `n` > 1 in sampling params)? While I don't think it is not necessary to support parallel sampling in this PR, I'd...

Hi @cmaureir, I'm also a maintainer of vLLM. We do make our best effort to keep the binary size small, but it's increasingly difficult to meet the current limit since...

@zhouyuan @bigPYJ1151 Could you please review this PR? Thanks!

@yzh119 I see. What we need at the moment are the Python 3.8-3.11 wheels built for PyTorch 2.1.2 + CUDA 12.1. However, we do agree that maintaining compatibility between the...

@yzh119 Also, do you mind if the vLLM team hosts specific PyTorch + CUDA versions of FlashInfer in PyPI under the name of `vllm-flashinfer-mirror` or something like that? This will...