ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: How do I respond slowly to concurrent requests for interfaces /api/v1/chats/{chat_id}/completions?

Open xyk0930 opened this issue 10 months ago • 3 comments

Describe your problem

  1. The response time is about 50s when there is only one request
  2. When there are 10 concurrent requests, the last response time is 3min40s
  3. Is this because of the ragflow service itself or because the LLM is not friendly to concurrent requests?

xyk0930 avatar Feb 20 '25 07:02 xyk0930

You could click the little lamp using UI to check the time elapsed.

KevinHuSh avatar Feb 21 '25 03:02 KevinHuSh

I checked. Mostly generating answers,It is definitely LLM problem. I used ollama to run the deepseek-r1:70b model, 8*4090 (24G) GPU, and the utilization rate of each GPU was less than 20%. I went to the ollama community and saw people raising similar problems, but there seemed to be no good solution. Do you have a good idea on how to increase usage with multiple Gpus? @KevinHuSh

xyk0930 avatar Feb 21 '25 07:02 xyk0930

No clue yet.

KevinHuSh avatar Feb 21 '25 11:02 KevinHuSh