To avoid being banned, set the maximum request per second for siliconflow's embedding
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.7.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
It seems that the system has not imposed any restrictions. I found a large number of blocked requests from the backend logs
✔️ Expected Behavior
Environment variable setting TPS for embedding and llm
❌ Actual Behavior
No response
To set the maximum requests per second (TPS) for embedding and LLM in Dify, you can configure the rate-limiting mechanism. The RateLimit class in rate_limit.py handles the rate-limiting logic. This class uses Redis to track and manage the number of active requests.
Here is a brief overview of how the RateLimit class enforces rate limits:
- Initialization: Sets up the maximum number of active requests (
max_active_requests) and initializes Redis keys to track active requests. - Entering a Request: Checks if the number of active requests exceeds the limit. If it does, it raises an
AppInvokeQuotaExceededError. Otherwise, it logs the request in Redis. - Exiting a Request: Removes the request from the active requests list in Redis.
- Flushing Cache: Periodically recalculates the number of active requests and removes any that have timed out.
The configurable parameter for TPS within this class is max_active_requests, which you can set during the initialization of the RateLimit instance [1].
For Docker setups, you can configure environment variables in the .env file. The docker-compose.md file provides detailed instructions for setting up Docker, including copying and editing the .env file for environment variable configurations [2][3][4][5].
To summarize, you need to:
- Set the
max_active_requestsparameter in theRateLimitclass. - Configure the relevant environment variables in the
.envfile for your Docker setup.
This should help you set and enforce the TPS limits for embedding and LLM requests in your self-hosted Dify instance.
To continue talking to Dosu, mention @dosu.