dify
dify copied to clipboard
When using qwen-14b and chatglm3-6b for multi-model debugging, an error occurred with chatglm3-6b.
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.5.10
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
qwen-14b and chatglm3-6b are custom large models managed by xinference.
-
截图
-
错误日志
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:HTTP Request: POST http://127.0.0.1:59997/v1/chat/completions "500 Internal Server Error"
DEBUG:openai._base_client:Encountered httpx.HTTPStatusError
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
response.raise_for_status()
File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
response.raise_for_status()
File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
response.raise_for_status()
File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
response.raise_for_status()
File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500
DEBUG:openai._base_client:Re-raising status error
ERROR:core.application_manager:Unknown Error when generating
Traceback (most recent call last):
File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 96, in invoke
result = self._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)
File "/app/api/core/model_runtime/model_providers/xinference/llm/llm.py", line 79, in _invoke
return self._generate(
File "/app/api/core/model_runtime/model_providers/xinference/llm/llm.py", line 413, in _generate
resp = client.chat.completions.create(
File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 667, in create
return self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1208, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 897, in request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
return self._retry_request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
return self._retry_request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
return self._retry_request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 988, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/api/core/application_manager.py", line 173, in _generate_worker
runner.run(
File "/app/api/core/app_runner/assistant_app_runner.py", line 248, in run
self._handle_invoke_result(
File "/app/api/core/app_runner/app_runner.py", line 230, in _handle_invoke_result
self._handle_invoke_result_stream(
File "/app/api/core/app_runner/app_runner.py", line 263, in _handle_invoke_result_stream
for result in invoke_result:
File "/app/api/core/features/assistant_fc_runner.py", line 111, in run
chunks: Union[Generator[LLMResultChunk, None, None], LLMResult] = model_instance.invoke_llm(
File "/app/api/core/model_manager.py", line 70, in invoke_llm
return self.model_type_instance.invoke(
File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 111, in invoke
raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeServerUnavailableError: [xinference] Server Unavailable Error, Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}
DEBUG:core.app_runner.generate_task_pipeline:error: [xinference] Server Unavailable Error, Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Hey @andylzming, great to see you back! Running into a bit of a storm with those models, huh? Let's see if we can't debug this together.
The "500 Internal Server Error" you're encountering with chatglm3-6b in your Dify setup could be due to several factors related to the xinference model management or specific configurations. Here's a streamlined approach to troubleshoot and potentially resolve the issue:
-
Verify Server URL: Ensure the
server_url
for your xinference server is correct and accessible. -
Check Triton Server Health: Access
<server_url>/v2/health/ready
to confirm the Triton Inference Server's health. -
Review Configuration Parameters: Double-check the
model_credential_schema
, focusing oncontext_size
,completion_type
, andstream
fields. - Database Connectivity: Ensure the database connection settings are correct.
- Logs and Debugging: Review detailed logs from both the Dify application and the Triton Inference Server.
- Update and Compatibility: Consider updating Dify to the latest version and ensure compatibility with the Triton Inference Server version.
- Network Configuration: Look into any potential network issues or misconfigurations.
For more detailed troubleshooting, examining specific error logs and configuration details related to the chatglm3-6b model setup, the version of the Triton Inference Server, and steps taken before the error occurred would be beneficial.
Referencing similar issues might also provide additional insights:
- chatglm3 with Triton Inference Server bad request
- 【Internel Server Error】after v0.3.34updated to v0.4.1
- An error occurs when the Assistant uses Xinference to chat with ChatGLM3-6B
- dify cannot connect to xinference LLM
This should help you narrow down the cause of the error and find a resolution. If you need further assistance, please provide more specific details about your setup and the steps leading to the error.
Sources
- chatglm3 with Triton Inference Server bad request
- 【Internel Server Error】after v0.3.34updated to v0.4.1
- An error occurs when the Assistant uses Xinference to chat with ChatGLM3-6B
- dify cannot connect to xinference LLM
- api/controllers/web/completion.py
- api/core/model_runtime/model_providers/xinference/xinference.yaml
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
This issue is caused by xinference returning a Server Unavailable Error
exception.
This issue is caused by xinference returning a
Server Unavailable Error
exception.
Using “Basic Assistant” allows for normal conversation, while “Smart Assistant” results in the following error.
- xinference error logs
INFO 04-10 16:30:35 llm_engine.py:653] Avg prompt throughput: 24.8 tokens/s, Avg generation throughput: 6.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-10 16:30:35 async_llm_engine.py:111] Finished request 2f3a1d40-f779-11ee-b1b4-80615f20f615.
2024-04-10 16:30:35,419 xinference.api.restful_api 27390 ERROR [address=127.0.0.1:34773, pid=24418] 0
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
data = await model.chat(prompt, system_prompt, chat_history, kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
response = await self._call_wrapper(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
return await fn(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
return self._tool_calls_completion(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
content, func, args = cls._eval_chatglm3_arguments(c, tools)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
if isinstance(c[0], str):
KeyError: [address=127.0.0.1:34773, pid=24418] 0
I've already talked to the xorbits team, they will fix it soon.
Hi, @andylzming,
I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, you encountered a 500 Internal Server Error when using the chatglm3-6b model for multi-model debugging with qwen-14b. I provided troubleshooting steps, and takatost mentioned that the issue is caused by xinference returning a "Server Unavailable Error" exception, which the xorbits team will fix soon. The issue has been resolved by the xorbits team.
Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you!
Please upgrade to latest version to see if this issue persists. :)