inference chatglm3-6b 无法实现工具调用

Describe the bug

我通过xinference分别部署了chatglm3-6b 和 qwen1.5-chat-7b, 使用同样的代码测试工具调用，chatglm3-6b会失败

To Reproduce

To help us to reproduce this bug, please provide information below:

Your Python version.
3.10.0
The version of xinference you use. 0.10.0

Versions of crucial packages.

client conda env : 
    langchain                0.1.14
    langchain-community      0.0.30
    langchain-core           0.1.37
    langchain-experimental   0.0.56
    langchain-openai         0.1.1
    langchain-text-splitters 0.0.1
    langsmith                0.1.38
    openai                   1.14.3

Full stack of the error.

    Traceback (most recent call last):
  File "/Users/shin/ideaProjects/langchain_learn/functioncall2.py", line 31, in <module>
    completion = client.chat.completions.create(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 667, in create
    return self._post(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1208, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 897, in request
    return self._request(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
    return self._retry_request(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
    return self._request(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
    return self._retry_request(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
    return self._request(
  File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 988, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': '[address=0.0.0.0:43987, pid=28005] 0'}

Minimized code to reproduce the error.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_lucky_lottery",
            "description": "生成双色球彩票号码",
            "parameters": {
                "type": "int",
                "properties": {"num": {"description": "生成的彩票号码组数"}},
                "required": ["num"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "lottery_draw",
            "description": "开奖双色球",
            "parameters": {
                "type": "object",
                "properties": {},
            },
        },
    },
]

import openai

client = openai.Client(api_key="not empty", base_url=f"http://10.9.123.456:9997/v1")
completion = client.chat.completions.create(
    model="chatglm3",   #qwen1.5-chat # chatglm3
    messages=[{"role": "user", "content": "帮我生成5组双色球号码"}],
    tools=tools,
)
print(f"Lottery numbers: {completion}")

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Apr 02 '24 01:04 lordk911

Do you have the log from the server side?

Apr 02 '24 02:04 qinxuye

Do you have the log from the server side?

INFO 04-02 10:50:10 async_llm_engine.py:508] Received request b8e07c4a-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:10 metrics.py:218] Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 3.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:15 async_llm_engine.py:120] Finished request b8e07c4a-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:15,193 xinference.api.restful_api 27827 ERROR    [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0
INFO 04-02 10:50:16 async_llm_engine.py:508] Received request bc182656-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:16 metrics.py:218] Avg prompt throughput: 4.4 tokens/s, Avg generation throughput: 31.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:21 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 38.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:21 async_llm_engine.py:120] Finished request bc182656-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:21,587 xinference.api.restful_api 27827 ERROR    [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0
INFO 04-02 10:50:23 async_llm_engine.py:508] Received request c094f240-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:26 metrics.py:218] Avg prompt throughput: 4.8 tokens/s, Avg generation throughput: 22.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:28 async_llm_engine.py:120] Finished request c094f240-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:28,465 xinference.api.restful_api 27827 ERROR    [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0

Apr 02 '24 02:04 lordk911

@lordk911 I tried to reproduce your issue, but I got a normal return:

Lottery numbers: ChatCompletion(id='chatcmpl-6926d221-a91a-48a8-b95f-7f4ca9849a4b', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_6926d221-a91a-48a8-b95f-7f4ca9849a4b', function=Function(arguments='{"num": 5}', name='get_lucky_lottery'), type='function')]))], created=1712644860, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

Apr 09 '24 08:04 mujin2

@mujin2 Did you use vllm backend?

Apr 10 '24 10:04 qinxuye

碰到了同样的问题，Chatglm3调用tool失败：

2024-04-28 08:32:57,390 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:32841, pid=271] 0
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 489, in async_chat
    return self._tool_calls_completion(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 663, in _tool_calls_completion
    content, func, args = cls._eval_tool_arguments(model_family, c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 621, in _eval_tool_arguments
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 569, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=0.0.0.0:32841, pid=271] 0
2024-04-28 08:32:57,390 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:32841, pid=271] 0

xinference-local安装命令：

docker run -p 9997:9997 -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/xinference --gpus all xprobe/xinference:v0.10.3 xinference-local -H 0.0.0.0 --port 9997

xinference-local版本：v0.10.3

这是我们的模型参数： 7ed8f2c6e13d92eb947b3ad36e3f59e

调用参数：

{
  "model": "chatglm3",
  "messages": [
    {"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
    {"role": "user", "content": "上海现在的天气怎么样？"}
  ],
  "temperature": 0.7,
  "tools": [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市，例如北京"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "使用的温度单位。从所在的城市进行推断。"
                    }
                },
                "required": ["location", "format"]
            }
        }
    }]
}

Apr 28 '24 08:04 dengpan1

@codingl2k1 能看下这个问题么？

Apr 28 '24 09:04 qinxuye

@codingl2k1 能看下这个问题么？

好的，在看。

Apr 28 '24 09:04 codingl2k1

换了模型参数tool调用正常，但是token消耗返回的都是-1，

{
    "id": "chatcmpl-057940aa-4dfb-4f82-9165-a1477f15859d",
    "model": "chatglm3",
    "object": "chat.completion",
    "created": 1714295104,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_057940aa-4dfb-4f82-9165-a1477f15859d",
                        "type": "function",
                        "function": {
                            "name": "get_current_weather",
                            "arguments":"{\"location\": \"\上\海\", \"format\": \"celsius\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": -1,
        "completion_tokens": -1,
        "total_tokens": -1
    }
}

Apr 28 '24 10:04 dengpan1

我这里测试也是调用正常

@pytest.mark.parametrize(
    "model_format, quantization", [("ggmlv3", "q4_0"), ("pytorch", None)]
)
# @pytest.mark.skip(reason="Cost too many resources.")
def test_restful_api_for_tool_calls(setup, model_format, quantization):
    model_name = "chatglm3"

    endpoint, _ = setup
    url = f"{endpoint}/v1/models"

    # list
    response = requests.get(url)
    response_data = response.json()
    assert len(response_data["data"]) == 0

    # launch
    payload = {
        "model_uid": "test_tool",
        "model_name": model_name,
        "model_size_in_billions": 6,
        "model_format": model_format,
        "quantization": quantization,
    }

    response = requests.post(url, json=payload)
    response_data = response.json()
    model_uid_res = response_data["model_uid"]
    assert model_uid_res == "test_tool"

    response = requests.get(url)
    response_data = response.json()
    assert len(response_data["data"]) == 1

    tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市，例如北京"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "使用的温度单位。从所在的城市进行推断。"
                    }
                },
                "required": ["location", "format"]
            }
        }
    }]

    url = f"{endpoint}/v1/chat/completions"
    payload = {
        "model": model_uid_res,
        "messages": [
            {"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
            {"role": "user", "content": "上海现在的天气怎么样？"}
        ],
        "temperature": 0.7,
        "tools": tools,
        "stop": ["\n"],
    }
    response = requests.post(url, json=payload)
    completion = response.json()
    print(completion)

    assert (
        "get_current_weather"
        == completion["choices"][0]["message"]["tool_calls"][0]["function"]["name"]
    )
    arguments = completion["choices"][0]["message"]["tool_calls"][0]["function"][
        "arguments"
    ]
    arg = json.loads(arguments)
    assert arg == {'location': '上海', 'format': 'celsius'}

Apr 28 '24 10:04 codingl2k1

我这里测试也是调用正常

@pytest.mark.parametrize(
    "model_format, quantization", [("ggmlv3", "q4_0"), ("pytorch", None)]
)
# @pytest.mark.skip(reason="Cost too many resources.")
def test_restful_api_for_tool_calls(setup, model_format, quantization):
    model_name = "chatglm3"

    endpoint, _ = setup
    url = f"{endpoint}/v1/models"

    # list
    response = requests.get(url)
    response_data = response.json()
    assert len(response_data["data"]) == 0

    # launch
    payload = {
        "model_uid": "test_tool",
        "model_name": model_name,
        "model_size_in_billions": 6,
        "model_format": model_format,
        "quantization": quantization,
    }

    response = requests.post(url, json=payload)
    response_data = response.json()
    model_uid_res = response_data["model_uid"]
    assert model_uid_res == "test_tool"

    response = requests.get(url)
    response_data = response.json()
    assert len(response_data["data"]) == 1

    tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "获取当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市，例如北京"
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "使用的温度单位。从所在的城市进行推断。"
                    }
                },
                "required": ["location", "format"]
            }
        }
    }]

    url = f"{endpoint}/v1/chat/completions"
    payload = {
        "model": model_uid_res,
        "messages": [
            {"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
            {"role": "user", "content": "上海现在的天气怎么样？"}
        ],
        "temperature": 0.7,
        "tools": tools,
        "stop": ["\n"],
    }
    response = requests.post(url, json=payload)
    completion = response.json()
    print(completion)

    assert (
        "get_current_weather"
        == completion["choices"][0]["message"]["tool_calls"][0]["function"]["name"]
    )
    arguments = completion["choices"][0]["message"]["tool_calls"][0]["function"][
        "arguments"
    ]
    arg = json.loads(arguments)
    assert arg == {'location': '上海', 'format': 'celsius'}

@codingl2k1 你使用的是ggmlv3，token消耗数正常吗？使用vLLM就报错了

Apr 28 '24 11:04 dengpan1

这个配置就会报错

Apr 28 '24 11:04 dengpan1

这个配置就会报错

一样的配置，我用这个测试正常：

import openai

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "获取当前天气",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "城市，例如北京"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "使用的温度单位。从所在的城市进行推断。"
                }
            },
            "required": ["location", "format"]
        }
    }
}]

client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1")
completion = client.chat.completions.create(
    model="chatglm3",  # qwen1.5-chat # chatglm3
    messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
              {"role": "user", "content": "上海现在的天气怎么样？"}
              ],
    tools=tools,
)
print(completion)

output

ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}')
Out[4]: {'location': '上海', 'format': 'celsius'}

Apr 28 '24 12:04 codingl2k1

这个配置就会报错

一样的配置，我用这个测试正常：

import openai

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "获取当前天气",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "城市，例如北京"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "使用的温度单位。从所在的城市进行推断。"
                }
            },
            "required": ["location", "format"]
        }
    }
}]

client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1")
completion = client.chat.completions.create(
    model="chatglm3",  # qwen1.5-chat # chatglm3
    messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
              {"role": "user", "content": "上海现在的天气怎么样？"}
              ],
    tools=tools,
)
print(completion)

output

ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}')
Out[4]: {'location': '上海', 'format': 'celsius'}

这好奇怪，我这就是不行。Token怎么返回是-1呢？

Apr 29 '24 02:04 dengpan1

请问下我们的安装方式一样吗，

docker run -p 9997:9997 -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/xinference --gpus all xprobe/xinference:v0.10.3 xinference-local -H 0.0.0.0 --port 9997

Apr 29 '24 02:04 dengpan1

这个配置就会报错

一样的配置，我用这个测试正常：

import openai

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "获取当前天气",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "城市，例如北京"
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "使用的温度单位。从所在的城市进行推断。"
                }
            },
            "required": ["location", "format"]
        }
    }
}]

client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1")
completion = client.chat.completions.create(
    model="chatglm3",  # qwen1.5-chat # chatglm3
    messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
              {"role": "user", "content": "上海现在的天气怎么样？"}
              ],
    tools=tools,
)
print(completion)

输出

ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}')
Out[4]: {'location': '上海', 'format': 'celsius'}

我第一次进行工具调用正常返回，继续对话调用工具会报错，请问下二次调用工具你这边好着吗

May 06 '24 08:05 SongSongK

r = await func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _asy
nc_wrapper
    return await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _cal
l_wrapper
    ret = await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 504
, in async_chat
    return self._tool_calls_completion(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 672, in
_tool_calls_completion
    content, func, args = cls._eval_tool_arguments(model_family, c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 630, in
_eval_tool_arguments
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 578, in
_eval_chatglm3_arguments
    if isinstance(c[0], str):

May 17 '24 12:05 jiaolongxue

r = await func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _asy
nc_wrapper
    return await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _cal
l_wrapper
    ret = await fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 504
, in async_chat
    return self._tool_calls_completion(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 672, in
_tool_calls_completion
    content, func, args = cls._eval_tool_arguments(model_family, c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 630, in
_eval_tool_arguments
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 578, in
_eval_chatglm3_arguments
    if isinstance(c[0], str):

这个看着像是返回结果不是 tool call 格式的，求个测试用例？

May 20 '24 07:05 codingl2k1

This issue is stale because it has been open for 7 days with no activity.

Aug 06 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Aug 12 '24 03:08 github-actions[bot]

inference inference copied to clipboard

chatglm3-6b 无法实现工具调用

Describe the bug

To Reproduce

Expected behavior

Additional context

inference
inference copied to clipboard