The /v1/chat-messages interface is very slow
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.8.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Given that the issue of slow response times with the /v1/chat-messages API endpoint began after upgrading to version 0.8.2, it's likely that the upgrade introduced changes affecting performance
When using the console page for debugging, the speed is very fast.
It seems that when you use the /v1/chat-messages API endpoint, the response is slow, and this slowness is also reflected in the console logs. This indicates that the issue is likely not related to network.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Hey @rechardwang! 👋 I'm here to help you with any bugs, questions, or contributions you have for the project. Let's work together to solve this issue!
I found some similar issues and discussions that might help you troubleshoot the slow response times with the /v1/chat-messages API endpoint:
- Discussion: When publishing Api, ''/chat-messages'' api responds slowly with data in streaming mode (Unsolved) [1].
- Issue: chat-messages API is taking more time compared to test run in dify (Closed) [2].
- Suggested Solution: Implement streaming output in the
invoke_llmmethod to improve responsiveness. Here is the suggested code modification:llm_result: LLMResult = model_instance.invoke_llm( prompt_messages=prompt_messages, model_parameters=app_orchestration_config.model_config.parameters, tools=[], stop=app_orchestration_config.model_config.stop, stream=True, # Enable streaming user=self.user_id, callbacks=[], ) - Additionally, consider increasing the timeout value in the frontend code and implementing a reconnect feature to handle disconnections.
- Suggested Solution: Implement streaming output in the
- Issue: Self Hosted (Docker), api call /chat-messages with response_mode: streaming, the message_end response lasts up to 5 seconds (Closed) [3].
- Suggested Solution: The delay could be due to server processing time, network latency, server load, the overhead of streaming, or server-side configuration settings. It's recommended to check the server-side implementation, server logs, and possibly adjust server or network configurations.
For optimizing the performance of the /v1/chat-messages API endpoint, consider the following server and network configurations for running Dify in a self-hosted Docker environment:
-
API Server Configuration:
- DIFY_BIND_ADDRESS: Default is
0.0.0.0. - DIFY_PORT: Default is
5001. - SERVER_WORKER_AMOUNT: Number of API server workers, calculated as
number of CPU cores x 2 + 1. - SERVER_WORKER_CLASS: Default is
gevent. - GUNICORN_TIMEOUT: Default is
200. Recommended value is360to support longer SSE (Server-Sent Events) connection times.
- DIFY_BIND_ADDRESS: Default is
-
Database Configuration:
- DB_USERNAME, DB_PASSWORD, DB_HOST, DB_PORT (Default:
5432), DB_DATABASE. - SQLALCHEMY_POOL_SIZE: Default is
30connections. - SQLALCHEMY_POOL_RECYCLE: Default is
3600seconds.
- DB_USERNAME, DB_PASSWORD, DB_HOST, DB_PORT (Default:
-
Redis Configuration:
- REDIS_HOST, REDIS_PORT (Default:
6379), REDIS_DB (Default:0), REDIS_PASSWORD. - REDIS_USE_SSL: Default is
false.
- REDIS_HOST, REDIS_PORT (Default:
-
Celery Configuration:
- CELERY_WORKER_CLASS: Default is
gevent. - CELERY_WORKER_AMOUNT: Default is
1. - CELERY_BROKER_URL: Redis URL for Celery broker.
- CELERY_WORKER_CLASS: Default is
-
File Storage Configuration:
- STORAGE_TYPE: Default is
local. - STORAGE_LOCAL_PATH: Path for local storage.
- STORAGE_TYPE: Default is
-
Network Configuration:
- Ensure Docker containers can communicate with each other and with the database and Redis instances.
- Configure appropriate firewall rules to allow traffic on necessary ports (e.g.,
5001for API,5432for PostgreSQL,6379for Redis).
Example Docker Compose configuration snippet:
version: '3.8'
services:
api:
image: dify/api:latest
environment:
- DIFY_BIND_ADDRESS=0.0.0.0
- DIFY_PORT=5001
- SERVER_WORKER_AMOUNT=5 # Adjust based on CPU cores
- GUNICORN_TIMEOUT=360
- DB_HOST=your_db_host
- DB_PORT=5432
- REDIS_HOST=your_redis_host
- REDIS_PORT=6379
- REDIS_PASSWORD=your_redis_password
- STORAGE_TYPE=local
ports:
- "5001:5001"
volumes:
- ./storage:/app/api/storage
db:
image: postgres:latest
environment:
- POSTGRES_USER=your_db_user
- POSTGRES_PASSWORD=your_db_password
- POSTGRES_DB=your_db_name
ports:
- "5432:5432"
redis:
image: redis:latest
environment:
- REDIS_PASSWORD=your_redis_password
ports:
- "6379:6379"
This configuration ensures that the Dify API server, PostgreSQL database, and Redis instance are properly set up and can communicate with each other, providing a robust environment for optimal performance [4][5][6].
To continue talking to Dosu, mention @dosu.
所以 /v1/chat-messages 速度提不起来吗?
不行,LLM 那边应该生成很快,但是计算token会变慢,暂时应该没有时间修复。
@crazywoola 这个问题最终是修复了还是没有修复? 不修复压测都过不了。压测的时候加入dify chat-messages这个api封装是直通的2倍时间。
同样遇到了这个问题,直接访问大模型与通过dify访问的时间差别有两倍
现在怎么样了,我用的还是老版本0.15.3,貌似也遇到相同情况#16602。
如果有解决 期待留言 我也遇到这个问题
any response? 有人知道要如何解决么?0.15.3 运行半年后 速度由原来的10S变成了60S~