vllm [Usage]: restarting vllm --> "WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL"

Your current environment

Hey guys :)

since version 0.6.6 up to the current V0.7.2 I have a slightly annoying problem. When I start up my AI server, vllm everything works fine. The model is loaded and can be used as desired. However, as soon as I end my start script and want to load the same model again, or even a different model, vLLM always freezes at this point:

[W214 15:20:56.973624780 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.008814328 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.104798409 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.107680744 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.196595399 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.199089483 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.205991785 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[W214 15:20:56.207522727 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
(VllmWorkerProcess pid=8968) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8969) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8968) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8969) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8971) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8967) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8971) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8967) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8970) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8972) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8973) INFO 02-14 15:20:56 utils.py:950] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=8970) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8972) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=8973) INFO 02-14 15:20:56 pynccl.py:69] vLLM is using nccl==2.21.5

When I quit vLLM before, I always get this warning: [rank0]:[W214 15:18:06.519697808 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

Unfortunately, I cannot find any examples of how I can execute the start script so that the shutdown is executed correctly. Can you help me with this?

My startskript.sh:

#!/bin/bash

token=50000
export HF_TOKEN=hf_myToken
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export CUDA_LAUNCH_BLOCKING=0
export disable_custom_all_reduce=True

python -m vllm.entrypoints.openai.api_server \
        --model=mistralai/Pixtral-Large-Instruct-2411 \
        --config-format mistral \
        --load-format mistral \
        --tokenizer_mode mistral \
        --limit_mm_per_prompt 'image=10' \
        --host 192.uuu.xxx.yyy \
        --port myPORT \
        --trust-remote-code \
        --device cuda \
        --tensor-parallel-size 8 \
        --gpu-memory-utilization 1 \
        --swap-space 10 \
        --max_num_seqs 3 \
        --max_num_batched_tokens $token \
        --max_model_len $token

How would you like to use vllm

I would like it if I didn't always have to restart the AI machine to reload a model with vLLM

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Feb 14 '25 14:02 AlbiRadtke

Feb 17 '25 05:02 SwarmKit

Yes, at this point it stucks when reloading a model. Do you have a solution? :)

Feb 17 '25 18:02 AlbiRadtke

Does anyone has an idea? Or should i report it as a bug? Thank you all! :)

Feb 20 '25 15:02 AlbiRadtke

same error for me, either with pip install or building from source.

Feb 23 '25 09:02 W-Wuxian

Hi @AlbiRadtke ! Did you solve this problem? I have the same issue and I can't understand how come people do not seem to have it, maybe there's some config we are not aware of? How come no one encounters this problem? If you have any solutions or workarounds it would be really helpful.

Feb 25 '25 16:02 vgabbo

Unfortunately, despite a lot of research, I have not found a solution and since I am obviously not the only one with the problem, I have now reported this as bug #13836. I will therefore close the issu here now and only pursue the bug further

Best regards :)

Feb 25 '25 17:02 AlbiRadtke

When we adjust our GPU servers, I got the same error [rank0]:[W313 03:10:56.503853755 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) so I tried to figure out if there were any annoying processes using ps -ef | grep vllm kill -9 10000( 100000 means your process id ) Then I just kill them so that vllm serve can work. Hope this can help you.

Mar 13 '25 03:03 foreverwith

@foreverwith thank you very much for your suggestion and help, I will do the same when it happens again. Fortunately I don't need to stop the server many times, still if I have the issue at least I know how to proceed. Thanks.

Mar 13 '25 21:03 vgabbo

That really is a great idea! @foreverwith Which version of vLLM are you using? I use vllm 0.7.2

Unfortunately, ps -ef | grep vllm doesn't show an open process that I could terminate. So this great idea won't work for me :( Do you start vLLM as a service, as a batch file or directly as a Python file?

Mar 14 '25 16:03 AlbiRadtke

Hi @AlbiRadtke ! Did you solve this problem? I have the same issue and I can't understand how come people do not seem to have it, maybe there's some config we are not aware of? How come no one encounters this problem? If you have any solutions or workarounds it would be really helpful.

same here, and i can't startup vllm serve again

Mar 18 '25 07:03 mhsbz

Hey @mhsbz Unfortunately, I have still not been able to solve the problem. But I downloaded and tested the pre-release version 0.8.0 at the weekend. I didn't have this problem there, the model could be loaded and when I ended the process and then restarted it, everything was reloaded as hoped without having to restart the entire system. In this respect I have hope Unfortunately, I still can't use v0.8.0 because there seem to be problems with the chat template of my model, but as I said, it was only a pre-release version :)

Mar 18 '25 16:03 AlbiRadtke

i would say its solved on v0.8.2

Mar 26 '25 17:03 AlbiRadtke