[Bug]: 为什么在部署qwen2.5-vl-32b-instruct的时候,部署过程被卡死不动了
Your current environment
我的环境environment: Package Version
accelerate 1.5.2 addict 2.4.0 aiohappyeyeballs 2.6.1 aiohttp 3.11.14 aiosignal 1.3.2 airportsdata 20250224 annotated-types 0.7.0 anyio 4.9.0 astor 0.8.1 attrs 25.3.0 blake3 1.0.4 cachetools 5.5.2 certifi 2025.1.31 charset-normalizer 3.4.1 click 8.1.8 cloudpickle 3.1.1 compressed-tensors 0.9.2 cupy-cuda12x 13.4.1 depyf 0.18.0 dill 0.3.9 diskcache 5.6.3 distro 1.9.0 dnspython 2.7.0 einops 0.8.1 email_validator 2.2.0 fastapi 0.115.12 fastapi-cli 0.0.7 fastrlock 0.8.3 filelock 3.18.0 fire 0.7.0 frozenlist 1.5.0 fsspec 2025.3.0 gguf 0.10.0 h11 0.14.0 httpcore 1.0.7 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.29.3 idna 3.10 importlib_metadata 8.6.1 interegular 0.3.3 Jinja2 3.1.6 jiter 0.9.0 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 lark 1.2.2 llguidance 0.7.11 llvmlite 0.43.0 lm-format-enforcer 0.10.11 lmdeploy 0.7.2.post1 markdown-it-py 3.0.0 MarkupSafe 3.0.2 mdurl 0.1.2 mistral_common 1.5.4 mmengine-lite 0.10.7 mpmath 1.3.0 msgpack 1.1.0 msgspec 0.19.0 multidict 6.2.0 nest-asyncio 1.6.0 networkx 3.4.2 ninja 1.11.1.4 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-cusparselt-cu12 0.6.2 nvidia-ml-py 12.570.86 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 nvitop 1.4.2 openai 1.69.0 opencv-python-headless 4.11.0.86 outlines 0.1.11 outlines_core 0.1.26 packaging 24.2 partial-json-parser 0.2.1.1.post5 peft 0.14.0 pillow 11.1.0 pip 25.0 platformdirs 4.3.7 prometheus_client 0.21.1 prometheus-fastapi-instrumentator 7.1.0 propcache 0.3.1 protobuf 6.30.2 psutil 7.0.0 py-cpuinfo 9.0.0 pycountry 24.6.1 pydantic 2.11.1 pydantic_core 2.33.0 Pygments 2.19.1 pynvml 12.0.0 python-dotenv 1.1.0 python-json-logger 3.3.0 python-multipart 0.0.20 PyYAML 6.0.2 pyzmq 26.3.0 ray 2.44.1 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rich 13.9.4 rich-toolkit 0.14.0 rpds-py 0.24.0 safetensors 0.5.3 scipy 1.15.2 sentencepiece 0.2.0 setuptools 75.8.0 shellingham 1.5.4 shortuuid 1.0.13 six 1.17.0 sniffio 1.3.1 starlette 0.46.1 sympy 1.13.1 termcolor 2.5.0 tiktoken 0.9.0 tokenizers 0.21.1 torch 2.6.0 torchaudio 2.6.0 torchvision 0.21.0 tqdm 4.67.1 transformers 4.50.3 triton 3.2.0 typer 0.15.2 typing_extensions 4.13.0 typing-inspection 0.4.0 urllib3 2.3.0 uvicorn 0.34.0 uvloop 0.21.0 vllm 0.8.2 watchfiles 1.0.4 websockets 15.0.1 wheel 0.45.1 xformers 0.0.29.post2 xgrammar 0.1.16 yapf 0.43.0 yarl 1.18.3 zipp 3.21.0
🐛 Describe the bug
问题截图:
我的部署命令:
#!/bin/sh export CUDA_VISIBLE_DEVICES=2,5
本脚本用于自动更新测试环境的模型
params
model_path="/data22/ljc/proj/ckpt/Qwen/Qwen2.5-VL-32B-Instruct" model_name="Qwen2___5-VL-32B-Instruct-AWQ" port=2032 log_path="/home/li_mingze/模型部署/log/qwen2.5_vl_32B_instruct_vllm.log"
kill -9 $(lsof -t -i:$port)
启动服务
nohup vllm serve $model_path
--tensor-parallel-size 2
--gpu-memory-utilization 0.8
--port $port
--enable-prefix-caching
--limit-mm-per-prompt image=10,video=1 \
$log_path 2>&1
echo "服务已启动,进程ID:$!"
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
I noticed that you aren't using GPUs with consecutive indices. From my experience, this may cause NCCL hang with tensor parallelism.
I noticed that you aren't using GPUs with consecutive indices. From my experience, this may cause NCCL hang with tensor parallelism.
i adjust the order of GPUs index , unfortunetly it dosent work
Can you try --enforce-eager?
Also, updating to latest code (not latest release) includes some optimizations which should reduce the loading time
设置日志级别为:VLLM_LOGGING_LEVEL=debug 观测下加载过程。
Hi, i am currently follow the vllm implementation from R1-V, however, my overall process hangs when it tries to instantiate llm class. I have set the environment parameters like below:
export MASTER_ADDR=$(hostname -s) # master node = first node
export MASTER_PORT=29500
export NODE_RANK=$SLURM_NODEID
pip list | grep nccl
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1
export NCCL_DEBUG=TRACE
export NCCL_DEBUG_SUBSYS=ALL
export NCCL_SHM_DISABLE=1
export CUDA_LAUNCH_BLOCKING=1
export TORCH_NCCL_TRACE_BUFFER_SIZE=10485760
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export NCCL_IB_GID_INDEX=3
export NCCL_CUMEM_ENABLE=0
# python -c "import torch; torch.distributed.init_process_group(backend='nccl'); print('NCCL test passed')"
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_DEBU=1
export VLLM_TRACE_FUNCTION=1
export VLLM_HOST_IP=0.0.0.0
export TORCHINDUCTOR_COMPILE_THREADS=1
export CUDA_MODULE_LOADING=LAZY
export VLLM_USE_V1=0
export VLLM_WORKER_MULTIPROC_METHOD=spawn
but nothing regarding launching the llm_engine were published, and then the rank0 reached its timeout and overall process dies
Can you show the command used to run vLLM?
cc @youkaichao
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!