inference
inference copied to clipboard
chatglm3-6b 无法实现工具调用
Describe the bug
我通过xinference分别部署了chatglm3-6b 和 qwen1.5-chat-7b, 使用同样的代码测试工具调用,chatglm3-6b会失败
To Reproduce
To help us to reproduce this bug, please provide information below:
- Your Python version.
3.10.0 - The version of xinference you use. 0.10.0
- Versions of crucial packages.
client conda env : langchain 0.1.14 langchain-community 0.0.30 langchain-core 0.1.37 langchain-experimental 0.0.56 langchain-openai 0.1.1 langchain-text-splitters 0.0.1 langsmith 0.1.38 openai 1.14.3 - Full stack of the error.
Traceback (most recent call last):
File "/Users/shin/ideaProjects/langchain_learn/functioncall2.py", line 31, in <module>
completion = client.chat.completions.create(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 667, in create
return self._post(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1208, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 897, in request
return self._request(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
return self._retry_request(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
return self._request(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
return self._retry_request(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
return self._request(
File "/Users/shin/miniconda3/envs/langchain_official/lib/python3.10/site-packages/openai/_base_client.py", line 988, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': '[address=0.0.0.0:43987, pid=28005] 0'}
- Minimized code to reproduce the error.
tools = [
{
"type": "function",
"function": {
"name": "get_lucky_lottery",
"description": "生成双色球彩票号码",
"parameters": {
"type": "int",
"properties": {"num": {"description": "生成的彩票号码组数"}},
"required": ["num"],
},
},
},
{
"type": "function",
"function": {
"name": "lottery_draw",
"description": "开奖双色球",
"parameters": {
"type": "object",
"properties": {},
},
},
},
]
import openai
client = openai.Client(api_key="not empty", base_url=f"http://10.9.123.456:9997/v1")
completion = client.chat.completions.create(
model="chatglm3", #qwen1.5-chat # chatglm3
messages=[{"role": "user", "content": "帮我生成5组双色球号码"}],
tools=tools,
)
print(f"Lottery numbers: {completion}")
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Do you have the log from the server side?
Do you have the log from the server side?
INFO 04-02 10:50:10 async_llm_engine.py:508] Received request b8e07c4a-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:10 metrics.py:218] Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 3.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:15 async_llm_engine.py:120] Finished request b8e07c4a-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:15,193 xinference.api.restful_api 27827 ERROR [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
return self._tool_calls_completion(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0
INFO 04-02 10:50:16 async_llm_engine.py:508] Received request bc182656-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:16 metrics.py:218] Avg prompt throughput: 4.4 tokens/s, Avg generation throughput: 31.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:21 metrics.py:218] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 38.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:21 async_llm_engine.py:120] Finished request bc182656-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:21,587 xinference.api.restful_api 27827 ERROR [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
return self._tool_calls_completion(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0
INFO 04-02 10:50:23 async_llm_engine.py:508] Received request c094f240-f09b-11ee-848d-d691e90c7356: prompt: '<|user|>\n 帮我生成5组双色球号码\n<|assistant|>', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|user|>', '<|observation|>'], stop_token_ids=[64795, 64797, 2], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
INFO 04-02 10:50:26 metrics.py:218] Avg prompt throughput: 4.8 tokens/s, Avg generation throughput: 22.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-02 10:50:28 async_llm_engine.py:120] Finished request c094f240-f09b-11ee-848d-d691e90c7356.
2024-04-02 10:50:28,465 xinference.api.restful_api 27827 ERROR [address=0.0.0.0:43987, pid=28005] 0
Traceback (most recent call last):
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
return self._tool_calls_completion(
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/data/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=0.0.0.0:43987, pid=28005] 0
@lordk911 I tried to reproduce your issue, but I got a normal return:
Lottery numbers: ChatCompletion(id='chatcmpl-6926d221-a91a-48a8-b95f-7f4ca9849a4b', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_6926d221-a91a-48a8-b95f-7f4ca9849a4b', function=Function(arguments='{"num": 5}', name='get_lucky_lottery'), type='function')]))], created=1712644860, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))
@mujin2 Did you use vllm backend?
碰到了同样的问题,Chatglm3调用tool失败:
2024-04-28 08:32:57,390 xinference.api.restful_api 1 ERROR [address=0.0.0.0:32841, pid=271] 0
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1413, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 489, in async_chat
return self._tool_calls_completion(
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 663, in _tool_calls_completion
content, func, args = cls._eval_tool_arguments(model_family, c, tools)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 621, in _eval_tool_arguments
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 569, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=0.0.0.0:32841, pid=271] 0
2024-04-28 08:32:57,390 xinference.api.restful_api 1 ERROR [address=0.0.0.0:32841, pid=271] 0
xinference-local安装命令:
docker run -p 9997:9997 -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/xinference --gpus all xprobe/xinference:v0.10.3 xinference-local -H 0.0.0.0 --port 9997
xinference-local版本:v0.10.3
这是我们的模型参数:
调用参数:
{
"model": "chatglm3",
"messages": [
{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
{"role": "user", "content": "上海现在的天气怎么样?"}
],
"temperature": 0.7,
"tools": [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "获取当前天气",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市,例如北京"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "使用的温度单位。从所在的城市进行推断。"
}
},
"required": ["location", "format"]
}
}
}]
}
@codingl2k1 能看下这个问题么?
@codingl2k1 能看下这个问题么?
好的,在看。
换了模型参数tool调用正常,但是token消耗返回的都是-1,
{
"id": "chatcmpl-057940aa-4dfb-4f82-9165-a1477f15859d",
"model": "chatglm3",
"object": "chat.completion",
"created": 1714295104,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_057940aa-4dfb-4f82-9165-a1477f15859d",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments":"{\"location\": \"\上\海\", \"format\": \"celsius\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": -1,
"completion_tokens": -1,
"total_tokens": -1
}
}
我这里测试也是调用正常
@pytest.mark.parametrize(
"model_format, quantization", [("ggmlv3", "q4_0"), ("pytorch", None)]
)
# @pytest.mark.skip(reason="Cost too many resources.")
def test_restful_api_for_tool_calls(setup, model_format, quantization):
model_name = "chatglm3"
endpoint, _ = setup
url = f"{endpoint}/v1/models"
# list
response = requests.get(url)
response_data = response.json()
assert len(response_data["data"]) == 0
# launch
payload = {
"model_uid": "test_tool",
"model_name": model_name,
"model_size_in_billions": 6,
"model_format": model_format,
"quantization": quantization,
}
response = requests.post(url, json=payload)
response_data = response.json()
model_uid_res = response_data["model_uid"]
assert model_uid_res == "test_tool"
response = requests.get(url)
response_data = response.json()
assert len(response_data["data"]) == 1
tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "获取当前天气",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市,例如北京"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "使用的温度单位。从所在的城市进行推断。"
}
},
"required": ["location", "format"]
}
}
}]
url = f"{endpoint}/v1/chat/completions"
payload = {
"model": model_uid_res,
"messages": [
{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
{"role": "user", "content": "上海现在的天气怎么样?"}
],
"temperature": 0.7,
"tools": tools,
"stop": ["\n"],
}
response = requests.post(url, json=payload)
completion = response.json()
print(completion)
assert (
"get_current_weather"
== completion["choices"][0]["message"]["tool_calls"][0]["function"]["name"]
)
arguments = completion["choices"][0]["message"]["tool_calls"][0]["function"][
"arguments"
]
arg = json.loads(arguments)
assert arg == {'location': '上海', 'format': 'celsius'}
我这里测试也是调用正常
@pytest.mark.parametrize( "model_format, quantization", [("ggmlv3", "q4_0"), ("pytorch", None)] ) # @pytest.mark.skip(reason="Cost too many resources.") def test_restful_api_for_tool_calls(setup, model_format, quantization): model_name = "chatglm3" endpoint, _ = setup url = f"{endpoint}/v1/models" # list response = requests.get(url) response_data = response.json() assert len(response_data["data"]) == 0 # launch payload = { "model_uid": "test_tool", "model_name": model_name, "model_size_in_billions": 6, "model_format": model_format, "quantization": quantization, } response = requests.post(url, json=payload) response_data = response.json() model_uid_res = response_data["model_uid"] assert model_uid_res == "test_tool" response = requests.get(url) response_data = response.json() assert len(response_data["data"]) == 1 tools = [{ "type": "function", "function": { "name": "get_current_weather", "description": "获取当前天气", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "城市,例如北京" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "使用的温度单位。从所在的城市进行推断。" } }, "required": ["location", "format"] } } }] url = f"{endpoint}/v1/chat/completions" payload = { "model": model_uid_res, "messages": [ {"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"}, {"role": "user", "content": "上海现在的天气怎么样?"} ], "temperature": 0.7, "tools": tools, "stop": ["\n"], } response = requests.post(url, json=payload) completion = response.json() print(completion) assert ( "get_current_weather" == completion["choices"][0]["message"]["tool_calls"][0]["function"]["name"] ) arguments = completion["choices"][0]["message"]["tool_calls"][0]["function"][ "arguments" ] arg = json.loads(arguments) assert arg == {'location': '上海', 'format': 'celsius'}
@codingl2k1 你使用的是ggmlv3,token消耗数正常吗?使用vLLM就报错了
这个配置就会报错
这个配置就会报错
一样的配置,我用这个测试正常:
import openai
tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "获取当前天气",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市,例如北京"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "使用的温度单位。从所在的城市进行推断。"
}
},
"required": ["location", "format"]
}
}
}]
client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1")
completion = client.chat.completions.create(
model="chatglm3", # qwen1.5-chat # chatglm3
messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"},
{"role": "user", "content": "上海现在的天气怎么样?"}
],
tools=tools,
)
print(completion)
output
ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))
json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}')
Out[4]: {'location': '上海', 'format': 'celsius'}
这个配置就会报错
一样的配置,我用这个测试正常:
import openai tools = [{ "type": "function", "function": { "name": "get_current_weather", "description": "获取当前天气", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "城市,例如北京" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "使用的温度单位。从所在的城市进行推断。" } }, "required": ["location", "format"] } } }] client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1") completion = client.chat.completions.create( model="chatglm3", # qwen1.5-chat # chatglm3 messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"}, {"role": "user", "content": "上海现在的天气怎么样?"} ], tools=tools, ) print(completion)output
ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}') Out[4]: {'location': '上海', 'format': 'celsius'}
这好奇怪,我这就是不行。Token怎么返回是-1呢?
请问下我们的安装方式一样吗,
docker run -p 9997:9997 -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/xinference --gpus all xprobe/xinference:v0.10.3 xinference-local -H 0.0.0.0 --port 9997
这个配置就会报错
一样的配置,我用这个测试正常:
import openai tools = [{ "type": "function", "function": { "name": "get_current_weather", "description": "获取当前天气", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "城市,例如北京" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "使用的温度单位。从所在的城市进行推断。" } }, "required": ["location", "format"] } } }] client = openai.Client(api_key="not empty", base_url="http://127.0.0.1:9997/v1") completion = client.chat.completions.create( model="chatglm3", # qwen1.5-chat # chatglm3 messages=[{"role": "system", "content": "你是一个有用的助手。不要对要函数调用的值做出假设。"}, {"role": "user", "content": "上海现在的天气怎么样?"} ], tools=tools, ) print(completion)输出
ChatCompletion(id='chatcmpl-ffa65b01-fe57-4209-888d-11000445e228', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ffa65b01-fe57-4209-888d-11000445e228', function=Function(arguments='{"location": "\\u4e0a\\u6d77", "format": "celsius"}', name='get get_current_weather'), type='function')]))], created=1714307582, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))json.loads('{"location": "\\u4e0a\\u6d77", "format": "celsius"}') Out[4]: {'location': '上海', 'format': 'celsius'}
我第一次进行工具调用正常返回,继续对话调用工具会报错,请问下二次调用工具你这边好着吗
r = await func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _asy
nc_wrapper
return await fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _cal
l_wrapper
ret = await fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 504
, in async_chat
return self._tool_calls_completion(
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 672, in
_tool_calls_completion
content, func, args = cls._eval_tool_arguments(model_family, c, tools)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 630, in
_eval_tool_arguments
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 578, in
_eval_chatglm3_arguments
if isinstance(c[0], str):
r = await func(self, *args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat response = await self._call_wrapper( File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _asy nc_wrapper return await fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _cal l_wrapper ret = await fn(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 504 , in async_chat return self._tool_calls_completion( File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 672, in _tool_calls_completion content, func, args = cls._eval_tool_arguments(model_family, c, tools) File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 630, in _eval_tool_arguments content, func, args = cls._eval_chatglm3_arguments(c, tools) File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 578, in _eval_chatglm3_arguments if isinstance(c[0], str):
这个看着像是返回结果不是 tool call 格式的,求个测试用例?
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.
这个配置就会报错