dify When using qwen-14b and chatglm3-6b for multi-model debugging, an error occurred with chatglm3-6b.

Self Checks

[X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

qwen-14b and chatglm3-6b are custom large models managed by xinference.

截图
错误日志

DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:openai._base_client:HTTP Request: POST http://127.0.0.1:59997/v1/chat/completions "500 Internal Server Error"
DEBUG:openai._base_client:Encountered httpx.HTTPStatusError
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in _request
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 749, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://127.0.0.1:59997/v1/chat/completions'
For more information check: https://httpstatuses.com/500
DEBUG:openai._base_client:Re-raising status error
ERROR:core.application_manager:Unknown Error when generating
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 96, in invoke
    result = self._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)
  File "/app/api/core/model_runtime/model_providers/xinference/llm/llm.py", line 79, in _invoke
    return self._generate(
  File "/app/api/core/model_runtime/model_providers/xinference/llm/llm.py", line 413, in _generate
    resp = client.chat.completions.create(
  File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 667, in create
    return self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1208, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 897, in request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
    return self._retry_request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
    return self._retry_request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 973, in _request
    return self._retry_request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1021, in _retry_request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 988, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/application_manager.py", line 173, in _generate_worker
    runner.run(
  File "/app/api/core/app_runner/assistant_app_runner.py", line 248, in run
    self._handle_invoke_result(
  File "/app/api/core/app_runner/app_runner.py", line 230, in _handle_invoke_result
    self._handle_invoke_result_stream(
  File "/app/api/core/app_runner/app_runner.py", line 263, in _handle_invoke_result_stream
    for result in invoke_result:
  File "/app/api/core/features/assistant_fc_runner.py", line 111, in run
    chunks: Union[Generator[LLMResultChunk, None, None], LLMResult] = model_instance.invoke_llm(
  File "/app/api/core/model_manager.py", line 70, in invoke_llm
    return self.model_type_instance.invoke(
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 111, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeServerUnavailableError: [xinference] Server Unavailable Error, Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}
DEBUG:core.app_runner.generate_task_pipeline:error: [xinference] Server Unavailable Error, Error code: 500 - {'detail': '[address=127.0.0.1:34773, pid=24418] 0'}
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Apr 10 '24 02:04 andylzming

Hey @andylzming, great to see you back! Running into a bit of a storm with those models, huh? Let's see if we can't debug this together.

The "500 Internal Server Error" you're encountering with chatglm3-6b in your Dify setup could be due to several factors related to the xinference model management or specific configurations. Here's a streamlined approach to troubleshoot and potentially resolve the issue:

Verify Server URL: Ensure the server_url for your xinference server is correct and accessible.
Check Triton Server Health: Access <server_url>/v2/health/ready to confirm the Triton Inference Server's health.
Review Configuration Parameters: Double-check the model_credential_schema, focusing on context_size, completion_type, and stream fields.
Database Connectivity: Ensure the database connection settings are correct.
Logs and Debugging: Review detailed logs from both the Dify application and the Triton Inference Server.
Update and Compatibility: Consider updating Dify to the latest version and ensure compatibility with the Triton Inference Server version.
Network Configuration: Look into any potential network issues or misconfigurations.

For more detailed troubleshooting, examining specific error logs and configuration details related to the chatglm3-6b model setup, the version of the Triton Inference Server, and steps taken before the error occurred would be beneficial.

Referencing similar issues might also provide additional insights:

This should help you narrow down the cause of the error and find a resolution. If you need further assistance, please provide more specific details about your setup and the steps leading to the error.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Apr 10 '24 02:04 dosubot[bot]

This issue is caused by xinference returning a Server Unavailable Error exception.

Apr 10 '24 06:04 takatost

This issue is caused by xinference returning a Server Unavailable Error exception.

Using “Basic Assistant” allows for normal conversation, while “Smart Assistant” results in the following error.

xinference error logs

INFO 04-10 16:30:35 llm_engine.py:653] Avg prompt throughput: 24.8 tokens/s, Avg generation throughput: 6.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-10 16:30:35 async_llm_engine.py:111] Finished request 2f3a1d40-f779-11ee-b1b4-80615f20f615.
2024-04-10 16:30:35,419 xinference.api.restful_api 27390 ERROR    [address=127.0.0.1:34773, pid=24418] 0
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=127.0.0.1:34773, pid=24418] 0

Apr 10 '24 07:04 andylzming

I've already talked to the xorbits team, they will fix it soon.

Apr 15 '24 03:04 takatost

Hi, @andylzming,

I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, you encountered a 500 Internal Server Error when using the chatglm3-6b model for multi-model debugging with qwen-14b. I provided troubleshooting steps, and takatost mentioned that the issue is caused by xinference returning a "Server Unavailable Error" exception, which the xorbits team will fix soon. The issue has been resolved by the xorbits team.

Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

May 16 '24 16:05 dosubot[bot]

Please upgrade to latest version to see if this issue persists. :)

May 28 '24 07:05 crazywoola

dify dify copied to clipboard

When using qwen-14b and chatglm3-6b for multi-model debugging, an error occurred with chatglm3-6b.

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Sources

About Dosu

dify
dify copied to clipboard