FastChat
FastChat copied to clipboard
Vllm worker does not release semaphore
When I used vllm_worker to deploy the vicuna model, the --limit-worker-concurrency settings was 3. After running for a while, I found that the model could not work. From the log, I found that the semaphore was not released (after three times, the semaphore value was 0)
Here is the log
2024-05-10 05:56:32 | ERROR | stderr | ERROR: Exception in ASGI application 2024-05-10 05:56:32 | ERROR | stderr | Traceback (most recent call last): 2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call 2024-05-10 05:56:32 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive)) 2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 05:56:32 | ERROR | stderr | await func() 2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect 2024-05-10 05:56:32 | ERROR | stderr | message = await receive() 2024-05-10 05:56:32 | ERROR | stderr | ^^^^^^^^^^^^^^^ 2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive 2024-05-10 05:56:32 | ERROR | stderr | await self.message_event.wait() 2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait 2024-05-10 05:56:32 | ERROR | stderr | await fut 2024-05-10 05:56:32 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe55b141750 2024-05-10 05:56:32 | ERROR | stderr | 2024-05-10 05:56:32 | ERROR | stderr | During handling of the above exception, another exception occurred: 2024-05-10 05:56:32 | ERROR | stderr | 2024-05-10 05:56:32 | ERROR | stderr | + Exception Group Traceback (most recent call last): 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi 2024-05-10 05:56:32 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value] 2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call 2024-05-10 05:56:32 | ERROR | stderr | | return await self.app(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call 2024-05-10 05:56:32 | ERROR | stderr | | await super().call(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call 2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call 2024-05-10 05:56:32 | ERROR | stderr | | raise exc 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call 2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, _send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call 2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 05:56:32 | ERROR | stderr | | raise exc 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call 2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app 2024-05-10 05:56:32 | ERROR | stderr | | await route.handle(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle 2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app 2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 05:56:32 | ERROR | stderr | | raise exc 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app 2024-05-10 05:56:32 | ERROR | stderr | | await response(scope, receive, send) 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call 2024-05-10 05:56:32 | ERROR | stderr | | async with anyio.create_task_group() as task_group: 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit 2024-05-10 05:56:32 | ERROR | stderr | | raise BaseExceptionGroup( 2024-05-10 05:56:32 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) 2024-05-10 05:56:32 | ERROR | stderr | +-+---------------- 1 ---------------- 2024-05-10 05:56:32 | ERROR | stderr | | Traceback (most recent call last): 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 05:56:32 | ERROR | stderr | | await func() 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response 2024-05-10 05:56:32 | ERROR | stderr | | async for chunk in self.body_iterator: 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream 2024-05-10 05:56:32 | ERROR | stderr | | sampling_params = SamplingParams( 2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^ 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init 2024-05-10 05:56:32 | ERROR | stderr | | self._verify_args() 2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args 2024-05-10 05:56:32 | ERROR | stderr | | raise ValueError( 2024-05-10 05:56:32 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -862. 2024-05-10 05:56:32 | ERROR | stderr | +------------------------------------ 2024-05-10 05:56:43 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1220. worker_id: 42c39e7a. 2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46612 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46614 - "POST /count_token HTTP/1.1" 200 OK INFO 05-10 05:56:47 async_llm_engine.py:371] Received request 18dbe2d2c72c4cfe9fea1922bd4e8b84: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 05:56:47 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0% INFO 05-10 05:56:47 async_llm_engine.py:111] Finished request 18dbe2d2c72c4cfe9fea1922bd4e8b84. INFO 05-10 05:56:47 async_llm_engine.py:134] Aborted request 18dbe2d2c72c4cfe9fea1922bd4e8b84. 2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46616 - "POST /worker_generate HTTP/1.1" 200 OK 2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46708 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46710 - "POST /count_token HTTP/1.1" 200 OK INFO 05-10 05:56:48 async_llm_engine.py:371] Received request 594e5b0d350c4a5b8401814198fc447e: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 05:56:49 async_llm_engine.py:111] Finished request 594e5b0d350c4a5b8401814198fc447e. INFO 05-10 05:56:49 async_llm_engine.py:134] Aborted request 594e5b0d350c4a5b8401814198fc447e. 2024-05-10 05:56:49 | INFO | stdout | INFO: 127.0.0.1:46712 - "POST /worker_generate HTTP/1.1" 200 OK 2024-05-10 05:56:54 | INFO | stdout | INFO: 127.0.0.1:46930 - "POST /worker_generate_stream HTTP/1.1" 200 OK INFO 05-10 05:56:54 async_llm_engine.py:371] Received request 9b3295689edf474ba87e1ff73acf28a4: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 05:56:55 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0% INFO 05-10 05:56:56 async_llm_engine.py:111] Finished request 9b3295689edf474ba87e1ff73acf28a4. INFO 05-10 05:56:56 async_llm_engine.py:134] Aborted request 9b3295689edf474ba87e1ff73acf28a4. 2024-05-10 05:57:26 | INFO | stdout | INFO: 127.0.0.1:47946 - "POST /worker_generate_stream HTTP/1.1" 200 OK INFO 05-10 05:57:26 async_llm_engine.py:371] Received request f274e61e2013461f9ff211e905272eb7: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 05:57:26 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0% INFO 05-10 05:57:27 async_llm_engine.py:111] Finished request f274e61e2013461f9ff211e905272eb7. INFO 05-10 05:57:27 async_llm_engine.py:134] Aborted request f274e61e2013461f9ff211e905272eb7. 2024-05-10 05:57:28 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1224. worker_id: 42c39e7a. 2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49372 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49374 - "POST /count_token HTTP/1.1" 200 OK 2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49378 - "POST /worker_generate_stream HTTP/1.1" 200 OK 2024-05-10 05:58:10 | ERROR | stderr | ERROR: Exception in ASGI application 2024-05-10 05:58:10 | ERROR | stderr | Traceback (most recent call last): 2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call 2024-05-10 05:58:10 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive)) 2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 05:58:10 | ERROR | stderr | await func() 2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect 2024-05-10 05:58:10 | ERROR | stderr | message = await receive() 2024-05-10 05:58:10 | ERROR | stderr | ^^^^^^^^^^^^^^^ 2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive 2024-05-10 05:58:10 | ERROR | stderr | await self.message_event.wait() 2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait 2024-05-10 05:58:10 | ERROR | stderr | await fut 2024-05-10 05:58:10 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560585710 2024-05-10 05:58:10 | ERROR | stderr | 2024-05-10 05:58:10 | ERROR | stderr | During handling of the above exception, another exception occurred: 2024-05-10 05:58:10 | ERROR | stderr | 2024-05-10 05:58:10 | ERROR | stderr | + Exception Group Traceback (most recent call last): 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi 2024-05-10 05:58:10 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value] 2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call 2024-05-10 05:58:10 | ERROR | stderr | | return await self.app(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call 2024-05-10 05:58:10 | ERROR | stderr | | await super().call(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call 2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call 2024-05-10 05:58:10 | ERROR | stderr | | raise exc 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call 2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, _send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call 2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 05:58:10 | ERROR | stderr | | raise exc 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call 2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app 2024-05-10 05:58:10 | ERROR | stderr | | await route.handle(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle 2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app 2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 05:58:10 | ERROR | stderr | | raise exc 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app 2024-05-10 05:58:10 | ERROR | stderr | | await response(scope, receive, send) 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call 2024-05-10 05:58:10 | ERROR | stderr | | async with anyio.create_task_group() as task_group: 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit 2024-05-10 05:58:10 | ERROR | stderr | | raise BaseExceptionGroup( 2024-05-10 05:58:10 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) 2024-05-10 05:58:10 | ERROR | stderr | +-+---------------- 1 ---------------- 2024-05-10 05:58:10 | ERROR | stderr | | Traceback (most recent call last): 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 05:58:10 | ERROR | stderr | | await func() 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response 2024-05-10 05:58:10 | ERROR | stderr | | async for chunk in self.body_iterator: 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream 2024-05-10 05:58:10 | ERROR | stderr | | sampling_params = SamplingParams( 2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^ 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init 2024-05-10 05:58:10 | ERROR | stderr | | self._verify_args() 2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args 2024-05-10 05:58:10 | ERROR | stderr | | raise ValueError( 2024-05-10 05:58:10 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763. 2024-05-10 05:58:10 | ERROR | stderr | +------------------------------------ 2024-05-10 05:58:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a. 2024-05-10 05:58:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a. 2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51574 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51576 - "POST /count_token HTTP/1.1" 200 OK INFO 05-10 05:59:19 async_llm_engine.py:371] Received request 0ea08a7a94b44c0783dd435e387725d3: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=1e-08, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=4048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 05:59:19 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0% INFO 05-10 05:59:20 async_llm_engine.py:111] Finished request 0ea08a7a94b44c0783dd435e387725d3. INFO 05-10 05:59:20 async_llm_engine.py:134] Aborted request 0ea08a7a94b44c0783dd435e387725d3. 2024-05-10 05:59:20 | INFO | stdout | INFO: 127.0.0.1:51578 - "POST /worker_generate HTTP/1.1" 200 OK 2024-05-10 05:59:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a. 2024-05-10 06:00:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a. 2024-05-10 06:01:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a. 2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55740 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55742 - "POST /count_token HTTP/1.1" 200 OK INFO 05-10 06:01:30 async_llm_engine.py:371] Received request a22b98b85b564eaaad458ba20d1addaf: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-10 06:01:30 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0% INFO 05-10 06:01:30 async_llm_engine.py:111] Finished request a22b98b85b564eaaad458ba20d1addaf. INFO 05-10 06:01:30 async_llm_engine.py:134] Aborted request a22b98b85b564eaaad458ba20d1addaf. 2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55744 - "POST /worker_generate HTTP/1.1" 200 OK 2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56292 - "POST /model_details HTTP/1.1" 200 OK 2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56294 - "POST /count_token HTTP/1.1" 200 OK 2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56298 - "POST /worker_generate_stream HTTP/1.1" 200 OK 2024-05-10 06:01:46 | ERROR | stderr | ERROR: Exception in ASGI application 2024-05-10 06:01:46 | ERROR | stderr | Traceback (most recent call last): 2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call 2024-05-10 06:01:46 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive)) 2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 06:01:46 | ERROR | stderr | await func() 2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect 2024-05-10 06:01:46 | ERROR | stderr | message = await receive() 2024-05-10 06:01:46 | ERROR | stderr | ^^^^^^^^^^^^^^^ 2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive 2024-05-10 06:01:46 | ERROR | stderr | await self.message_event.wait() 2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait 2024-05-10 06:01:46 | ERROR | stderr | await fut 2024-05-10 06:01:46 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560586f10 2024-05-10 06:01:46 | ERROR | stderr | 2024-05-10 06:01:46 | ERROR | stderr | During handling of the above exception, another exception occurred: 2024-05-10 06:01:46 | ERROR | stderr | 2024-05-10 06:01:46 | ERROR | stderr | + Exception Group Traceback (most recent call last): 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi 2024-05-10 06:01:46 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value] 2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call 2024-05-10 06:01:46 | ERROR | stderr | | return await self.app(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call 2024-05-10 06:01:46 | ERROR | stderr | | await super().call(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call 2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call 2024-05-10 06:01:46 | ERROR | stderr | | raise exc 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call 2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, _send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call 2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 06:01:46 | ERROR | stderr | | raise exc 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call 2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app 2024-05-10 06:01:46 | ERROR | stderr | | await route.handle(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle 2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app 2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app 2024-05-10 06:01:46 | ERROR | stderr | | raise exc 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app 2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app 2024-05-10 06:01:46 | ERROR | stderr | | await response(scope, receive, send) 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call 2024-05-10 06:01:46 | ERROR | stderr | | async with anyio.create_task_group() as task_group: 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit 2024-05-10 06:01:46 | ERROR | stderr | | raise BaseExceptionGroup( 2024-05-10 06:01:46 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) 2024-05-10 06:01:46 | ERROR | stderr | +-+---------------- 1 ---------------- 2024-05-10 06:01:46 | ERROR | stderr | | Traceback (most recent call last): 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap 2024-05-10 06:01:46 | ERROR | stderr | | await func() 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response 2024-05-10 06:01:46 | ERROR | stderr | | async for chunk in self.body_iterator: 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream 2024-05-10 06:01:46 | ERROR | stderr | | sampling_params = SamplingParams( 2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^ 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init 2024-05-10 06:01:46 | ERROR | stderr | | self._verify_args() 2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args 2024-05-10 06:01:46 | ERROR | stderr | | raise ValueError( 2024-05-10 06:01:46 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763. 2024-05-10 06:01:46 | ERROR | stderr | +------------------------------------ 2024-05-10 06:01:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a. 2024-05-10 06:02:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a. 2024-05-10 06:03:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a. 2024-05-10 06:04:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a. 2024-05-10 06:04:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.