dify icon indicating copy to clipboard operation
dify copied to clipboard

To avoid being banned, set the maximum request per second for siliconflow's embedding

Open ukhack opened this issue 1 year ago • 1 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.7.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

It seems that the system has not imposed any restrictions. I found a large number of blocked requests from the backend logs

✔️ Expected Behavior

Environment variable setting TPS for embedding and llm

❌ Actual Behavior

No response

ukhack avatar Sep 09 '24 05:09 ukhack

To set the maximum requests per second (TPS) for embedding and LLM in Dify, you can configure the rate-limiting mechanism. The RateLimit class in rate_limit.py handles the rate-limiting logic. This class uses Redis to track and manage the number of active requests.

Here is a brief overview of how the RateLimit class enforces rate limits:

  1. Initialization: Sets up the maximum number of active requests (max_active_requests) and initializes Redis keys to track active requests.
  2. Entering a Request: Checks if the number of active requests exceeds the limit. If it does, it raises an AppInvokeQuotaExceededError. Otherwise, it logs the request in Redis.
  3. Exiting a Request: Removes the request from the active requests list in Redis.
  4. Flushing Cache: Periodically recalculates the number of active requests and removes any that have timed out.

The configurable parameter for TPS within this class is max_active_requests, which you can set during the initialization of the RateLimit instance [1].

For Docker setups, you can configure environment variables in the .env file. The docker-compose.md file provides detailed instructions for setting up Docker, including copying and editing the .env file for environment variable configurations [2][3][4][5].

To summarize, you need to:

  1. Set the max_active_requests parameter in the RateLimit class.
  2. Configure the relevant environment variables in the .env file for your Docker setup.

This should help you set and enforce the TPS limits for embedding and LLM requests in your self-hosted Dify instance.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Sep 09 '24 05:09 dosubot[bot]