Harry Mellor comments

Results 74 comments of


                                            Harry Mellor

Error when performing `benchmarks/benchmark_latency.py` using multiple GPUs on a single node

I don't think you need to call `ray start` to use tensor parallel anymore. Are you still experiencing this issue?

Error when performing `benchmarks/benchmark_latency.py` using multiple GPUs on a single node

I'll close this as stale for now

GPTQ / Quantization support?

I have successfully used both GPTQ and AWQ models with vLLM. Should this issue be considered solved @WoosukKwon?

Support for fastchat-t5-3b-v1.0

Closing as a duplicate of #187

Remove Ray for the dependency

Closed by https://github.com/vllm-project/vllm/pull/4539

Modify the current PyTorch model to C++

@zhuohan123 can this work be considered complete?

Check whether the input request is too long

Closing as this should now be fixed.

Any plan to support cpu only mode?

x86 CPU support was added in https://github.com/vllm-project/vllm/pull/3634 Since there are other issues asking for specific architectures, I will close this one as complete because there is now a CPU only...

Do not initialize process group when using a single GPU

Closing because a single worker will now only us Ray if the user specifies `--worker-use-ray`