[Bug]: "500 Internal Server Error" after upgrade to v0.5.4
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
After I upgraded to v0.5.4, got "500 Internal Server Error". My manifest snippet to start vllm:
containers:
- name: 8x7b-open
image: vllm/vllm-openai:v0.5.4
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
args: ["--model", "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4", "--host", "0.0.0.0", "--port", "8080", "--tensor-parallel-size", "2", "--seed", "42", "--trust-remote-code"]
securityContext:
privileged: true
ports:
- containerPort: 8080
env:
- name: OMP_NUM_THREADS
value: "2"
volumeMounts:
- mountPath: "/root/.cache"
name: ceph-volume
resources:
limits:
cpu: '12'
memory: 200Gi
nvidia.com/gpu: '2'
requests:
cpu: '12'
memory: 200Gi
nvidia.com/gpu: '2'
Backtrace log:
INFO: 10.254.17.246:59936 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 189, in create_chat_completion
generator = await openai_serving_chat.create_chat_completion(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 185, in create_chat_completion
return await self.chat_completion_full_generator(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 436, in chat_completion_full_generator
async for res in result_generator:
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 196, in generate
with self.socket() as socket:
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 59, in socket
socket = self.context.socket(zmq.constants.DEALER)
File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/context.py", line 354, in socket
socket_class( # set PYTHONTRACEMALLOC=2 to get the calling frame
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 218, in __init__
super().__init__(context, socket_type, **kwargs) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 156, in __init__
super().__init__(
File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
zmq.error.ZMQError: Too many open files
Also ulimit and lsof info:
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace# lsof | grep pt_main_t | wc -l
26295
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace# ulimit -n
1048576
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace#
cc @robertgshaw2-neuralmagic
@tonyaw if you want a quick solution, you can try to add --disable-frontend-multiprocessing
What's the side effect by adding this parameter "--disable-frontend-multiprocessing"? It isn't caused by OMP_NUM_THREADS=2, right? I have two A100, so OMP_NUM_THREADS shall be 2 right?
Thanks in advance!
--disable-frontend-multiprocessing will be slower
usually people don't need to set OMP_NUM_THREADS for vLLM
Thanks, I will do an analysis of how many unix sockets are opened up and see if there is anything we can do to reduce the amount, since we currently open a new socket for each generate request
--disable-frontend-multiprocessingwill be slowerusually people don't need to set
OMP_NUM_THREADSfor vLLM
@youkaichao @robertgshaw2-neuralmagic I have set this param --disable-frontend-multiprocessing, but still get the error as follows:
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': ''}
my vllm version is latest 0.5.5, and the cmd is
python -m vllm.entrypoints.openai.api_server \
--model /data/pretrain_dir/Meta-Llama-3-8B-Instruct \
--trust-remote-code \
--port $port \
--dtype auto \
--pipeline-parallel-size 1 \
--enforce-eager \
--enable-prefix-caching \
--enable-lora \
--disable-frontend-multiprocessing
The interesting thing is that even when I enter only one prompt at a time (to ensure the LLM isn't overloaded) during a certain period for testing the large model, it can still sometimes generate successfully and sometimes fail. The error when it fails is still "Error code: 500 - {'detail': ''}".
@TangJiakai this looks like a client side error. do you have the server side error trace?
@TangJiakai this looks like a client side error. do you have the server side error trace?
Yes, you are right! It's happened on client side.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!
I am facing the same issue only with Llama
Server side:
python -m vllm.entrypoints.openai.api_server \
--model /onyx/data/p118/huggingface_LLMs/meta-llama/Llama-3.1-8B-Instruct/ \
--host 0.0.0.0 \
--port 3000 \
--gpu_memory-utilization 0.7 \
--tensor-parallel-size 1 \
--pipeline-parallel-size 2 \
--device cuda \
--enforce-eager \
--dtype=half
Call:
curl http://172.30.1.111:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/onyx/data/p118/huggingface_LLMs/meta-llama/Llama-3.1-8B-Instruct/",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'
Result:
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 1721, in execute_model
hidden_or_intermediate_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/model_executor/models/llama.py", line 539, in forward
model_output = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 170, in __call__
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/model_executor/models/llama.py", line 363, in forward
hidden_states, residual = layer(positions, hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/model_executor/models/llama.py", line 277, in forward
hidden_states = self.self_attn(positions=positions,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/model_executor/models/llama.py", line 201, in forward
attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/attention/layer.py", line 184, in forward
return torch.ops.vllm.unified_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/_ops.py", line 1116, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/attention/layer.py", line 290, in unified_attention
return self.impl.forward(self, query, key, value, kv_cache, attn_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/attention/backends/xformers.py", line 572, in forward
out = PagedAttention.forward_prefix(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/attention/ops/paged_attn.py", line 211, in forward_prefix
context_attention_fwd(
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/attention/ops/prefix_prefill.py", line 825, in context_attention_fwd
_fwd_kernel[grid](
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/jit.py", line 345, in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
device = driver.active.get_current_device()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/driver.py", line 23, in __getattr__
self._initialize_obj()
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives[0]()
^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
self.utils = CudaUtils() # TODO: make static
^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/nvme/h/buildsets/eb_cyclone_rl/software/GCCcore/11.2.0/bin/gcc', '/tmp/tmp9hi7g5f0/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmp9hi7g5f0/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/nvme/h/lb21hg1/llm-env/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/lib64', '-L/lib', '-I/nvme/h/lb21hg1/llm-env/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp9hi7g5f0', '-I/usr/include/python3.11']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/applications.py", line 112, in __call__
await self.middleware_stack(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/routing.py", line 714, in __call__
await self.middleware_stack(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/routing.py", line 734, in app
await route.handle(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/entrypoints/utils.py", line 54, in wrapper
return handler_task.result()
^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 390, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 261, in create_chat_completion
return await self.chat_completion_full_generator(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 680, in chat_completion_full_generator
async for res in result_generator:
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 1004, in generate
async for output in await self.add_request(
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 114, in generator
raise result
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 56, in _log_task_completion
return_value = task.result()
^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 823, in run_engine_loop
result = task.result()
^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 746, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 351, in step_async
outputs = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 343, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 231, in _driver_execute_model_async
results = await asyncio.gather(*tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/utils.py", line 1329, in _run_task_with_lock
return await task(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/worker/worker_base.py", line 411, in execute_model
output = self.model_runner.execute_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/nvme/h/lb21hg1/llm-env/lib64/python3.11/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
raise type(err)(
^^^^^^^^^^
TypeError: CalledProcessError.__init__() missing 1 required positional argument: 'cmd'