vllm
vllm copied to clipboard
[Bug]: IndexError when sending a streaming request with tool use
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
When I run the server with Mistral with auto-tool support
vllm serve mistralai/Mistral-7B-Instruct-v0.3 --enable-auto-tool-choice --tool-call-parser mistral --chat-template tool_chat_template_mistral.jinja
and send the following streaming request to /v1/chat/completions
{
"model": "mistralai/Mistral-7B-Instruct-v0.3",
"tool_choice": {"type": "function", "function": {"name": "get_current_weather"}},
"stream": true,
"messages": [
{
"role": "user",
"content": "What is the weather like in California?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"description": "The city, e.g. San Francisco, CA",
"type": "string"
},
"unit": {
"enum": ["celsius", "fahrenheit"],
"type": "string"
}
},
"required": ["location"]
}
}
}
]
}
the response stream does not contain the data: [DONE]
message and the logs show an Exception being raised:
INFO 10-03 17:08:20 logger.py:36] Received request chat-0f80f7a4a8054f9582b94130483e7cd4: prompt: '<s>[AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": {"type": "object", "properties": {"location": {"description": "The city, e.g. San Francisco, CA", "type": "string"}, "unit": {"enum": ["celsius", "fahrenheit"], "type": "string"}}, "required": ["location"]}}}][/AVAILABLE_TOOLS][INST] What is the weather like in Denver?[/INST]', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32653, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json={'type': 'object', 'properties': {'location': {'description': 'The city, e.g. San Francisco, CA', 'type': 'string'}, 'unit': {'enum': ['celsius', 'fahrenheit'], 'type': 'string'}}, 'required': ['location']}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [1, 6, 1501, 7567, 1891, 2032, 1113, 3396, 1316, 1113, 3396, 2032, 10598, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 7286, 2032, 1113, 2226, 1040, 2636, 8854, 1065, 1032, 2846, 5491, 1316, 1113, 12206, 2032, 10598, 1891, 2032, 1113, 3582, 1316, 1113, 11491, 2032, 10598, 3501, 2032, 10598, 7286, 2032, 1113, 1782, 3758, 29493, 1085, 29491, 29489, 29491, 4420, 10454, 29493, 10229, 1316, 1113, 1891, 2032, 1113, 2195, 8474, 1113, 6074, 2032, 10598, 10825, 2032, 8135, 29485, 1958, 3938, 1316, 1113, 29490, 19425, 13075, 9651, 1113, 1891, 2032, 1113, 2195, 29507, 11549, 1113, 11661, 2032, 8135, 3501, 3010, 1743, 10925, 7, 3, 2592, 1117, 1040, 8854, 1505, 1065, 24828, 29572, 4], lora_request: None, prompt_adapter_request: None.
INFO: ::1:57734 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-03 17:08:20 mistral_tool_parser.py:53] Non-Mistral tokenizer detected when using a Mistral model...
INFO 10-03 17:08:24 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
Compiling FSM index for all state transitions: 100%|███████████████████████████████████████████| 55/55 [00:00<00:00, 91.64it/s]
INFO 10-03 17:08:26 engine.py:288] Added request chat-0f80f7a4a8054f9582b94130483e7cd4.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 257, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/usr/lib64/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f3492a6fe60
...
| File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 250, in __call__
| async with anyio.create_task_group() as task_group:
| File "/workspace/my-vllm/lib64/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 520, in chat_completion_stream_generator
| tool_parser.prev_tool_call_arr[index].get(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
| IndexError: list index out of range
+------------------------------------
Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.