ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: Can we add configuration items for customizing the API request rate and token quantity?

Open kostya-sec opened this issue 10 months ago • 1 comments

Is there an existing issue for the same feature request?

  • [x] I have checked the existing issues.

Is your feature request related to a problem?

Recently, when using the API request of SiliconAPI, I found that an RPM error occurred during document parsing, which caused the document parsing to fail. I tried to modified Dockerfile to install the ratelimit and tiktoken packages during the build process, and added a modified class to the llm directory so that there would be no rate limit error when requesting chat model, embedding model, rerank model, etc.

Describe the feature you'd like

Recently, when using the API request of SiliconAPI, I found that an RPM error occurred during document parsing, which caused the document parsing to fail. I tried to modified Dockerfile to install the ratelimit and tiktoken packages during the build process, and added a modified class to the llm directory so that there would be no rate limit error when requesting chat model, embedding model, rerank model, etc.

Describe implementation you've considered

No response

Documentation, adoption, use case


Additional information

No response

kostya-sec avatar Mar 07 '25 18:03 kostya-sec

export MAX_CONCURRENT_CHATS=10

KevinHuSh avatar Mar 10 '25 03:03 KevinHuSh

@carcoonzyk @kostya-sec LLM chat already support rate limit and retry: https://github.com/infiniflow/ragflow/blob/94181a990b957ed302952b4de17583d2b44f3099/rag/llm/chat_model.py#L178

You can do the similar thing for embedding models.

yuzhichang avatar May 15 '25 15:05 yuzhichang