vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Bug]: IndexError when sending a streaming request with tool use

Open tjohnson31415 opened this issue 4 months ago • 0 comments

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

When I run the server with Mistral with auto-tool support

vllm serve mistralai/Mistral-7B-Instruct-v0.3 --enable-auto-tool-choice --tool-call-parser mistral --chat-template tool_chat_template_mistral.jinja

and send the following streaming request to /v1/chat/completions

{
  "model": "mistralai/Mistral-7B-Instruct-v0.3",
  "tool_choice": {"type": "function", "function": {"name": "get_current_weather"}},
  "stream": true,
  "messages": [
    {
        "role": "user",
        "content": "What is the weather like in California?"
    }
  ],
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_current_weather",
              "description": "Get the current weather in a given location",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "location": {
                          "description": "The city, e.g. San Francisco, CA",
                          "type": "string"
                      },
                      "unit": {
                          "enum": ["celsius", "fahrenheit"],
                          "type": "string"
                      }
                  },
                  "required": ["location"]
              }
          }
      }
  ]
}

the response stream does not contain the data: [DONE] message and the logs show an Exception being raised:

INFO 10-03 17:08:20 logger.py:36] Received request chat-0f80f7a4a8054f9582b94130483e7cd4: prompt: '<s>[AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": {"type": "object", "properties": {"location": {"description": "The city, e.g. San Francisco, CA", "type": "string"}, "unit": {"enum": ["celsius", "fahrenheit"], "type": "string"}}, "required": ["location"]}}}][/AVAILABLE_TOOLS][INST] What is the weather like in Denver?[/INST]', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=32653, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), guided_decoding=GuidedDecodingParams(json={'type': 'object', 'properties': {'location': {'description': 'The city, e.g. San Francisco, CA', 'type': 'string'}, 'unit': {'enum': ['celsius', 'fahrenheit'], 'type': 'string'}}, 'required': ['location']}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), prompt_token_ids: [1, 6, 1501, 7567, 1891, 2032, 1113, 3396, 1316, 1113, 3396, 2032, 10598, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 7286, 2032, 1113, 2226, 1040, 2636, 8854, 1065, 1032, 2846, 5491, 1316, 1113, 12206, 2032, 10598, 1891, 2032, 1113, 3582, 1316, 1113, 11491, 2032, 10598, 3501, 2032, 10598, 7286, 2032, 1113, 1782, 3758, 29493, 1085, 29491, 29489, 29491, 4420, 10454, 29493, 10229, 1316, 1113, 1891, 2032, 1113, 2195, 8474, 1113, 6074, 2032, 10598, 10825, 2032, 8135, 29485, 1958, 3938, 1316, 1113, 29490, 19425, 13075, 9651, 1113, 1891, 2032, 1113, 2195, 29507, 11549, 1113, 11661, 2032, 8135, 3501, 3010, 1743, 10925, 7, 3, 2592, 1117, 1040, 8854, 1505, 1065, 24828, 29572, 4], lora_request: None, prompt_adapter_request: None.
INFO:     ::1:57734 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-03 17:08:20 mistral_tool_parser.py:53] Non-Mistral tokenizer detected when using a Mistral model...
INFO 10-03 17:08:24 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
Compiling FSM index for all state transitions: 100%|███████████████████████████████████████████| 55/55 [00:00<00:00, 91.64it/s]
INFO 10-03 17:08:26 engine.py:288] Added request chat-0f80f7a4a8054f9582b94130483e7cd4.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "/workspace/my-vllm/lib64/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/usr/lib64/python3.12/asyncio/locks.py", line 212, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f3492a6fe60
...
  |   File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/workspace/my-vllm/lib64/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/workspace/my-vllm/lib64/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 520, in chat_completion_stream_generator
    |     tool_parser.prev_tool_call_arr[index].get(
    |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
    | IndexError: list index out of range
    +------------------------------------

Before submitting a new issue...

  • [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

tjohnson31415 avatar Oct 03 '24 17:10 tjohnson31415