[Bug]: EXTREME HIGH CPU while using OLLAMA after upgrating ragflow to 0.19
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
no
RAGFlow image version
0.19.0
Other environment information
Actual behavior
EXTREME HIGH CPU while using OLLAMA after upgrating ragflow to 0.19. It seems that the ragflow not using GPU but CPU while running the ollama. There was no such problem when the version of ragflow is 0.17.0.
Expected behavior
No response
Steps to reproduce
using ollama in ragflow0.19.0
Additional information
No response
High GPU usage with Ollama in RAGFlow 0.19.0 is a known issue and has been discussed in several threads. The main causes are often related to how RAGFlow configures Ollama model options like context size, batch size, and memory mapping, which can significantly increase VRAM/GPU usage compared to previous versions. Features such as reranker, RAPTOR retrieval, auto-keyword, and auto-question can also add to GPU load. Disabling these features, lowering context size, or adjusting batch size can help reduce GPU usage. For some users, switching from Ollama to vLLM has provided better concurrency and lower GPU overhead, especially for large models or high request volumes. If you want to further limit GPU usage, you can also set the CUDA_VISIBLE_DEVICES environment variable or run RAGFlow on CPU if GPU memory is a constraint. There are no direct code changes in 0.19.0 that force higher GPU usage, but the default or dynamic settings may be more aggressive than in 0.17.0. For more details and configuration examples, see the discussions in issue #7981 and issue #7535.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@KevinHuSh It's quite urgent. I want to use the function of 0.19 but I cannot use it with this problem exist.
OLLAMA_KEEP_ALIVE=-1 ollama serve
OR,
ollama run llama3.1:70b --keepalive=-1m
OR, alter the code back from https://github.com/infiniflow/ragflow/commit/5b626870d05167026049c6ccf122494971459ee5
OLLAMA_KEEP_ALIVE=-1 ollama serveOR,
ollama run llama3.1:70b --keepalive=-1mOR, alter the code back from 5b62687
thanks. I will take a try.
@KevinHuSh I have tried all three ways you had given, none of them works. The problem still exist. I wonder why such problem would exist? For version0.17.0 it was completely OK.
@KevinHuSh Is there any other plan to fix this problem? sry for bothering so much time. It's really urgent for me.
Try the latest code. There's a PR about this issue.
@KevinHuSh I have updated to the latest code. For some agent using single LLM in ollama it was ok. However, for this agent ,which i use most frequently, the problem still exists.
As you can see in this agent, I use three diffenent LLM from ollama. While the agent started, it was completely ok for GPU and CPU during the first LLM(No.1 in the picture1). However, when the agent comes to the last LLM(No.2 in the picture1), the GPU has problem ,and CPU suddenly went extremly high(picture 2).
Origin problem I have been facing since updating ragflow to 0.19 is when the agent comes to the last LLM, the GPU turns to 0.9GB, now the situation has changed, but turn to the situation in picture2. Whatever updating the code or not, when it comesto the last LLM, the CPU is always extremly high. I changed the last LLM in my agent from deepseek32B-max2(which I extend the ctx to 16384) to the original deepseek32B then to gemma3, whatever LLM I change, the problem always happens.
For mistral-small, it approximaty using 15GB and for deepseek32B, it about using 21GB. I think the problem is that while running the deepseek32B, the mistral-small didn't released and continued running.
@KevinHuSh any other plan to solve this problem ? Thanks.
@KevinHuSh @RNGMARTIN I think part of this problem is that Ollama for whatever reason doesnt release the connection, I am starting to think that this is why GraphRAG fails on parsing because it will hold the connection for many hours. I have heard to vLLM and other options might solve this but I like the workflow of Ollama!
Is Ragflow closing the connection when it is finished - there could be race conditions that prevent this from happening in situations?
@KevinHuSh @RNGMARTIN I think part of this problem is that Ollama for whatever reason doesnt release the connection, I am starting to think that this is why GraphRAG fails on parsing because it will hold the connection for many hours. I have heard to vLLM and other options might solve this but I like the workflow of Ollama!
Is Ragflow closing the connection when it is finished - there could be race conditions that prevent this from happening in situations?
This problem only happens after updating ragflow to version 0.19. I am now using 0.17 which has no such problem. I don't know what is changed that caused this problem.
I have the same issue even I have updated to 0.19.1. What I did is simply
- create an assistant
- add a Knowledge Base to that assistant
- select a open source model from Ollama, e.g. deepseek-r1:14b, qwen3:14b...etc
- start a new chat Initially the GPU hikes which is expected and then, sooner, the GPU usage drops and CPU usage hikes where all CPU cores are fully occupied for a very long time until the chat is completed.
@RNGMARTIN wondering you figure out a workaround?