vllm vLLM Distributed Inference stuck when using multi -GPU

vLLM Distributed Inference stuck when using multi -GPU

Open RathoreShubh opened this issue 1 year ago • 9 comments

trafficstars

I am trying to run inferece server on multi GPU using this on (4 * NVIDIA GeForce RTX 3090) server.

python -u -m vllm.entrypoints.api_server --host 0.0.0.0 --model mistralai/Mistral-7B-Instruct-v0.2 --tensor-parallel-size 4

while this works fine when using --tensor-parallel-size =1 , but on using tensor-parallel-size >1 it stuck on strat up.

Thanks Screenshot 2024-01-17 at 5 44 11 PM

Jan 17 '24 12:01 RathoreShubh

this is happening to me too, on 2 * 3090

Jan 17 '24 18:01 RhizoNymph

try these parameters --gpu-memory-utilization 0.7~0.9 --max-model-len 8192

Jan 18 '24 03:01 s-natsubori

try these parameters --gpu-memory-utilization 0.7~0.9 --max-model-len 8192

hello, I have tried the method you provided, but it has no effect.

Jan 18 '24 12:01 Double-bear

No effect here either

Jan 18 '24 19:01 RhizoNymph

Did you found a solution i Ve the same issue ?

Feb 21 '24 18:02 BilalKHA95

@BilalKHA95 try this

export NCCL_P2P_DISABLE=1

this woked for me

Feb 22 '24 10:02 shubham-bnxt

@BilalKHA95 try this

export NCCL_P2P_DISABLE=1

this woked for me

Thank's !!! it's working now, this env variable + update cuda tooltik to 12.3

Feb 22 '24 16:02 BilalKHA95

export NCCL_P2P_DISABLE=1

This also solved this issue for me.

Mar 16 '24 07:03 Palmik

@BilalKHA95 try this export NCCL_P2P_DISABLE=1 this woked for me

Thank's !!! it's working now, this env variable + update cuda tooltik to 12.3

Hi! does this result in higher tokens/second for you ? (for a small model like: -model mistralai/Mistral-7B-Instruct-v0.2 --tensor-parallel-size 4) ? thanks!

May 04 '24 22:05 emersonium

This didn't work for me:

export NCCL_P2P_DISABLE=1

Is there any solutions?

Thank you guys very much in advance!

Best regards,

Shuyue June 9th, 2024

Jun 10 '24 01:06 SuperBruceJia

We have added documentation for this situation in #5430. Please take a look.

Jun 13 '24 09:06 DarkLight1337

vllm vllm copied to clipboard

vLLM Distributed Inference stuck when using multi -GPU

vllm
vllm copied to clipboard