Woosuk Kwon

Results 151 comments of Woosuk Kwon

Hi @JinuJeong, thanks for your interest and good question! vLLM does have the splitting mechanism, and never changes the semantics of the model. Our `InputMetdata` includes the metadata to identify...

@BUAADreamer @liulfy The `model` argument in `LLM` or `api_server` can also take the path to your local directory that contains the weight files.

@BUAADreamer Thanks for providing the example. It should not use the remote HF repo if the path is valid. Could you try this and let us know if it works?...

@sleepwalker2017 Thanks for trying out vLLM and reporting the performance issue! Yes, our sampler is indeed not optimized well yet. Particularly, vLLM performs sampling for one request at a time,...

@emsi Thanks for reporting it! Your beam search output looks very weird. We'll investigate it, but I believe if that is really a bug then the bug should be in...

It seems there's an error in parsing the output of `nvcc -V`. Could you run `nvcc -V` and tell us the output?

@dongkuang Your output doesn't seem wrong. It might be a bug regarding the aarch64 architecture, which we haven't tested vLLM on. For now, I'm afraid we don't have any aarch64...

Hi @canghongjian, thanks for trying out vLLM! vLLM runs a simple memory profiling and pre-allocates 90% of the total GPU memory for its weight and activation. You can configure this...

vLLM currently does not support pipeline parallelism. The `ParallelConfig.pipeline_parallel_size` attribute is for future use. When multiple GPUs are used, vLLM leverages tensor parallelism to shard the model and inputs evenly...

Hi @nearmax-p, could you [install vLLM from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source)? Then this error should disappear. Sorry for the inconvenience. We will update our pypi package very soon.