inference QUESTION: Connection reset by peer

我根据模板自定义了一个模型，但是在注册时输入xinference register --model-type LLM --file model.json --persist报错。自定义模型： 1697530192627 错误提示： Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request httplib_response = conn.getresponse() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 1368, in getresponse response.begin() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 317, in begin version, status, reason = self._read_status() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 278, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/test/anaconda3/envs/llama2/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen retries = retries.increment( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/urllib3/connectionpool.py", line 461, in _make_request httplib_response = conn.getresponse() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 1368, in getresponse response.begin() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 317, in begin version, status, reason = self._read_status() File "/home/test/anaconda3/envs/llama2/lib/python3.10/http/client.py", line 278, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/test/anaconda3/envs/llama2/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/bin/xinference", line 8, in sys.exit(cli()) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 250, in register_model client.register_model( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/client.py", line 971, in register_model response = requests.post(url, json=request_body) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Oct 17 '23 08:10 YinSonglin1997

Hi, the error seems to be a network issue. Could you check if you are working with the correct endpoint?

Oct 17 '23 08:10 UranusSeven

Hi, the error seems to be a network issue. Could you check if you are working with the correct endpoint?

Thank you for your reply. I added --endpoint and was able to successfully add the custom model. However, a new problem occurred when I launched the Web UI. It seems that dialogue is not supported. What is the reason?

Successfully added custom model： 1697534873273 1697534991394

new problem： 1697535107182 2023-10-17 08:52:42,081 xinference 3322011 INFO Xinference successfully started. Endpoint: http://0.0.0.0:9998 2023-10-17 08:52:42,126 xinference.core.supervisor 3322011 INFO Worker 0.0.0.0:29352 has been added successfully 2023-10-17 08:52:42,128 xinference.deploy.worker 3322011 INFO Xinference worker successfully started. 2023-10-17 08:54:19,391 xinference.model.llm.llm_family 3322011 WARNING Remove the cache of user-defined model chinese-alpaca-2-7b. Cache directory: /home/test/.xinference/cache/chinese-alpaca-2-7b-pytorch-7b 2023-10-17 08:57:42,407 xinference.model.llm.llm_family 3322011 INFO Caching from URI: /ldata/llms/chinese-alpaca-2-7b 2023-10-17 08:57:47,566 torch.distributed.nn.jit.instantiator 3327401 INFO Created a temporary directory at /tmp/tmpeg8jav1g 2023-10-17 08:57:47,567 torch.distributed.nn.jit.instantiator 3327401 INFO Writing /tmp/tmpeg8jav1g/_remote_module_non_scriptable.py [2023-10-17 08:57:47,632] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.05s/it] 2023-10-17 08:58:20,897 xinference.core.restful_api 3322011 ERROR [address=0.0.0.0:43769, pid=3327401] Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/restful_api.py", line 645, in create_chat_completion return await model.chat(prompt, system_prompt, chat_history, kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/pool.py", line 657, in send result = await self._run_coro(message.message_id, coro) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/backends/pool.py", line 368, in _run_coro return await coro File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xoscar/api.py", line 306, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 182, in chat return await self._call_wrapper(_wrapper) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 136, in _call_wrapper return await asyncio.to_thread(_wrapper) File "/home/test/anaconda3/envs/llama2/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/home/test/anaconda3/envs/llama2/lib/python3.10/concurrent/futures/thread.py", line 52, in run result = self.fn(*self.args, **self.kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/model.py", line 172, in _wrapper getattr(self._model, "chat")(prompt, *args, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/model/llm/pytorch/core.py", line 476, in chat assert self.model_family.prompt_style is not None AssertionError: [address=0.0.0.0:43769, pid=3327401] Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/routes.py", line 534, in predict output = await route_utils.call_process_api( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1554, in process_api result = await self.call_function( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1190, in call_function prediction = await fn(*processed_input) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/utils.py", line 634, in async_wrapper response = await f(*args, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/chat_interface.py", line 403, in _submit_fn response = await anyio.to_thread.run_sync( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 96, in generate_wrapper output = model.chat( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/client.py", line 405, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:43769, pid=3327401] Traceback (most recent call last): File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/routes.py", line 534, in predict output = await route_utils.call_process_api( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1554, in process_api result = await self.call_function( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/blocks.py", line 1190, in call_function prediction = await fn(*processed_input) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/utils.py", line 634, in async_wrapper response = await f(*args, **kwargs) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/gradio/chat_interface.py", line 403, in _submit_fn response = await anyio.to_thread.run_sync( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 96, in generate_wrapper output = model.chat( File "/home/test/anaconda3/envs/llama2/lib/python3.10/site-packages/xinference/client.py", line 405, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: [{'loc': ['body', 'messages', 1, 'content'], 'msg': 'none is not an allowed value', 'type': 'type_error.none.not_allowed'}]

Oct 17 '23 09:10 YinSonglin1997

Since you are using a chat or instruction following model, the prompt need to be formatted according to some the SFT data for the model to work and the format differs from model to model.

In xinference, the format is called prompt style. And for built-in models, the prompt styles are also builtin. You may check utils.py if you're interested.

For a custom chat model, the prompt style is also required. You can add the following prompt style that match "alpaca" format to your json:

"prompt_style":{
  "style_name":"ADD_COLON_SINGLE",
  "system_prompt":"You are an AI assistant that follows instruction extremely well. Help as much as you can.",
  "roles":[
     "User",
     "Response"
  ],
  "intra_message_sep":"\n\n### "
}

And the full model definition should be:

{
   "version":1,
   "context_length":4096,
   "model_name":"chinese-alpaca-2-7b",
   "model_lang":[
      "zh"
   ],
   "model_ability":[
      "chat"
   ],
   "model_specs":[
      {
         "model_format":"pytorch",
         "quantizations":[
            "none"
         ],
         "model_id":"yumic/chinese-alpaca-2-7b",
         "model_uri":"/ldata/llms/chinese-alpaca-2-7b",
         "model_size_in_billions": 7
      }
   ],
   "prompt_style":{
      "style_name":"ADD_COLON_SINGLE",
      "system_prompt":"You are an AI assistant that follows instruction extremely well. Help as much as you can.",
      "roles":[
         "User",
         "Response"
      ],
      "intra_message_sep":"\n\n### "
   }
}

Oct 18 '23 05:10 UranusSeven

Thank you for your reply, the problem has been solved. However, I found that the test effect of the model in the Launch Web UI of xinference interface is much better than that in the test of Dify platform. Did you encounter such situation during the test?

Oct 19 '23 07:10 YinSonglin1997

@YinSonglin1997 A possible reason might be the generation configurations, such as temperature, top_p, and top_k. Could you initiate xinference with --log-level debug and replicate this issue? The generation configurations for each request will be displayed, allowing you to see if there is any difference.

Oct 23 '23 02:10 UranusSeven

@YinSonglin1997 Hello. Were you able to resolve this issue? Is there any assistance I can provide?

Oct 26 '23 14:10 UranusSeven

@YinSonglin1997 Hello. Were you able to resolve this issue? Is there any assistance I can provide?

Thank you for remembering my problem, I tried --log-level debug, but found no exception.

Oct 27 '23 07:10 YinSonglin1997

@YinSonglin1997 Have you encountered this problem? Is there any good solution? #817

Dec 26 '23 06:12 gubinjie

This issue is stale because it has been open for 7 days with no activity.

Aug 09 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Aug 15 '24 19:08 github-actions[bot]

inference inference copied to clipboard

QUESTION: Connection reset by peer

inference
inference copied to clipboard