inference
inference copied to clipboard
QUESTION: Connection reset by peer
我根据模板自定义了一个模型,但是在注册时输入xinference register --model-type LLM --file model.json --persist报错。
自定义模型:
错误提示:
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
retries = retries.increment(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
six.raise_from(e, None)
File "
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/bin/xinference", line 8, in
Hi, the error seems to be a network issue. Could you check if you are working with the correct endpoint?
Hi, the error seems to be a network issue. Could you check if you are working with the correct endpoint?
Thank you for your reply. I added --endpoint and was able to successfully add the custom model. However, a new problem occurred when I launched the Web UI. It seems that dialogue is not supported. What is the reason?
Successfully added custom model:
new problem:
2023-10-17 08:52:42,081 xinference 3322011 INFO Xinference successfully started. Endpoint: http://0.0.0.0:9998
2023-10-17 08:52:42,126 xinference.core.supervisor 3322011 INFO Worker 0.0.0.0:29352 has been added successfully
2023-10-17 08:52:42,128 xinference.deploy.worker 3322011 INFO Xinference worker successfully started.
2023-10-17 08:54:19,391 xinference.model.llm.llm_family 3322011 WARNING Remove the cache of user-defined model chinese-alpaca-2-7b. Cache directory: /home/test/.xinference/cache/chinese-alpaca-2-7b-pytorch-7b
2023-10-17 08:57:42,407 xinference.model.llm.llm_family 3322011 INFO Caching from URI: /ldata/llms/chinese-alpaca-2-7b
2023-10-17 08:57:47,566 torch.distributed.nn.jit.instantiator 3327401 INFO Created a temporary directory at /tmp/tmpeg8jav1g
2023-10-17 08:57:47,567 torch.distributed.nn.jit.instantiator 3327401 INFO Writing /tmp/tmpeg8jav1g/_remote_module_non_scriptable.py
[2023-10-17 08:57:47,632] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.05s/it]
2023-10-17 08:58:20,897 xinference.core.restful_api 3322011 ERROR [address=0.0.0.0:43769, pid=3327401]
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/restful_api.py", line 645, in create_chat_completion
return await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/pool.py", line 657, in send
result = await self._run_coro(message.message_id, coro)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/pool.py", line 368, in _run_coro
return await coro
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/api.py", line 306, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 182, in chat
return await self._call_wrapper(_wrapper)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 136, in _call_wrapper
return await asyncio.to_thread(_wrapper)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 172, in _wrapper
getattr(self._model, "chat")(prompt, *args, **kwargs)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 476, in chat
assert self.model_family.prompt_style is not None
AssertionError: [address=0.0.0.0:43769, pid=3327401]
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/routes.py", line 534, in predict
output = await route_utils.call_process_api(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1554, in process_api
result = await self.call_function(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1190, in call_function
prediction = await fn(*processed_input)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/utils.py", line 634, in async_wrapper
response = await f(*args, **kwargs)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/chat_interface.py", line 403, in _submit_fn
response = await anyio.to_thread.run_sync(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 96, in generate_wrapper
output = model.chat(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/client.py", line 405, in chat
raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:43769, pid=3327401]
Traceback (most recent call last):
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/routes.py", line 534, in predict
output = await route_utils.call_process_api(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1554, in process_api
result = await self.call_function(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1190, in call_function
prediction = await fn(*processed_input)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/utils.py", line 634, in async_wrapper
response = await f(*args, **kwargs)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/chat_interface.py", line 403, in _submit_fn
response = await anyio.to_thread.run_sync(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 96, in generate_wrapper
output = model.chat(
File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/client.py", line 405, in chat
raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: [{'loc': ['body', 'messages', 1, 'content'], 'msg': 'none is not an allowed value', 'type': 'type_error.none.not_allowed'}]
Since you are using a chat or instruction following model, the prompt need to be formatted according to some the SFT data for the model to work and the format differs from model to model.
In xinference, the format is called prompt style. And for built-in models, the prompt styles are also builtin. You may check utils.py if you're interested.
For a custom chat model, the prompt style is also required. You can add the following prompt style that match "alpaca" format to your json:
"prompt_style":{
"style_name":"ADD_COLON_SINGLE",
"system_prompt":"You are an AI assistant that follows instruction extremely well. Help as much as you can.",
"roles":[
"User",
"Response"
],
"intra_message_sep":"\n\n### "
}
And the full model definition should be:
{
"version":1,
"context_length":4096,
"model_name":"chinese-alpaca-2-7b",
"model_lang":[
"zh"
],
"model_ability":[
"chat"
],
"model_specs":[
{
"model_format":"pytorch",
"quantizations":[
"none"
],
"model_id":"yumic/chinese-alpaca-2-7b",
"model_uri":"/ldata/llms/chinese-alpaca-2-7b",
"model_size_in_billions": 7
}
],
"prompt_style":{
"style_name":"ADD_COLON_SINGLE",
"system_prompt":"You are an AI assistant that follows instruction extremely well. Help as much as you can.",
"roles":[
"User",
"Response"
],
"intra_message_sep":"\n\n### "
}
}
Thank you for your reply, the problem has been solved. However, I found that the test effect of the model in the Launch Web UI of xinference interface is much better than that in the test of Dify platform. Did you encounter such situation during the test?
@YinSonglin1997 A possible reason might be the generation configurations, such as temperature, top_p, and top_k. Could you initiate xinference with --log-level debug and replicate this issue? The generation configurations for each request will be displayed, allowing you to see if there is any difference.
@YinSonglin1997 Hello. Were you able to resolve this issue? Is there any assistance I can provide?
@YinSonglin1997 Hello. Were you able to resolve this issue? Is there any assistance I can provide?
Thank you for remembering my problem, I tried --log-level debug, but found no exception.
@YinSonglin1997 Have you encountered this problem? Is there any good solution? #817
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.