[vllm]zmq.error.ZMQError: Operation not supported

Open ZhonghaoLu opened this issue 11 months ago • 1 comments

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[x] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

future: <Task finished name='Task-2' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/engine/multiprocessing/client.py:180> exception=ZMQError('Operation not supported')> Traceback (most recent call last): File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/engine/multiprocessing/client.py", line 186, in run_output_handler_loop while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT File "/home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/site-packages/zmq/_future.py", line 400, in poll raise _zmq.ZMQError(_zmq.ENOTSUP) zmq.error.ZMQError: Operation not supported Traceback (most recent call last): File "/home/lzh/anaconda3/envs/MiniCPM-o/bin/vllm", line 8, in sys.exit(main()) File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/scripts.py", line 201, in main args.dispatch_function(args) File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/scripts.py", line 42, in serve uvloop.run(run_server(args)) File "/home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/site-packages/uvloop/init.py", line 82, in run return loop.run_until_complete(wrapper()) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper return await main File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/entrypoints/openai/api_server.py", line 796, in run_server async with build_async_engine_client(args) as engine_client: File "/home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/contextlib.py", line 199, in aenter return await anext(self.gen) File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/entrypoints/openai/api_server.py", line 125, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/contextlib.py", line 199, in aenter return await anext(self.gen) File "/pfs/mt-hiEd6E/home/lzh/Code/MiniCPM-o/vllm/vllm/entrypoints/openai/api_server.py", line 219, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. /home/lzh/anaconda3/envs/MiniCPM-o/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

vllm 我创建了一个新环境按照以上方法进行服务部署，提示以上BUG

以下是我的服务部署命令 export CUDA_VISIBLE_DEVICES=0,1,2,3

vllm serve
/home/lzh/llm_model/OpenBMB/MiniCPM-o-2_6
--dtype auto
--max-model-len 2048
--tensor-parallel-size 4
--gpu_memory_utilization 0.7
--trust-remote-code
--host 0.0.0.0
--port 8070

运行环境 | Environment

- OS:linux
- Python:3.10
- Transformers:4.48.0
- PyTorch:2.5.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.2

备注 | Anything else?

No response

Jan 20 '25 03:01 ZhonghaoLu

官方有一个相似的 issue，不过他们好像也没解决这个问题，你可以先看看这个官方的 issue。 https://github.com/vllm-project/vllm/issues/11564

Jan 20 '25 14:01 HwwwwwwwH