[Question]: There are issues using ollama's local model for chat.
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
There are issues using ollama's local model for chat.There were no problems using the local Ollama model before, but the problem has been occurring recently.
But there are no problems using other online API models, what could be the reason?
There have been several recent changes in RAGFlow that could impact Ollama's local model integration. The Ollama dependency was updated from version 0.2.1 to >=0.5.0 to resolve a conflict with the httpx library, which may affect compatibility or functionality if your local Ollama version or environment hasn't been updated accordingly. This update was necessary because Ollama 0.2.1 required httpx <0.28.0, while other dependencies required newer versions of httpx. The new Ollama versions (>=0.5.0) removed this upper bound, but this change could introduce new compatibility issues if your setup is out of sync with these requirements (details).
There were also recent fixes to how Ollama models are listed and registered in RAGFlow, including changes to the model list API and factory registration logic, which could affect how local models are recognized or selected (details). Other recent fixes addressed prompt handling and image upload defaults for Ollama, which could impact chat functionality if your workflow depends on these features (details), (details).
If online API models work but the local Ollama model does not, possible reasons include: Ollama service not running or not accessible on the expected port, version mismatches between RAGFlow and Ollama, or issues with how models are listed or registered after recent updates.
Try checking that your Ollama service is running and accessible, confirm your Ollama version is >=0.5.0, and review any logs for errors related to model listing or prompt handling. If you recently updated RAGFlow, make sure your local environment and dependencies are consistent with the latest requirements.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
My Ollama version is >0.5.0, and I just upgraded Ollama to 0.12.10, but it still doesn't work.
Since Ollama 0.12.10 is installed and online API models work, the most common causes for local Ollama model failures in RAGFlow are network/configuration issues or a mismatch in model naming. Here are the key things to check:
-
If RAGFlow is running in Docker and Ollama is running on your host, make sure you use http://host.docker.internal:11434 (not localhost or 127.0.0.1) as the base URL for Ollama in RAGFlow. On Linux, you may need to add extra_hosts in your Docker config to enable this mapping. Ollama must be started with OLLAMA_HOST=0.0.0.0:11434 so it listens on all interfaces, not just localhost. You can verify connectivity by running curl http://host.docker.internal:11434/ from inside the RAGFlow container—it should return "Ollama is running" reference.
-
The model name in RAGFlow must exactly match what ollama list shows, including tags like :latest (e.g., qwen3:32b:latest).
-
If you see connection errors, try entering any value (even a dummy string) in the API key field in RAGFlow, as some versions require this even for local models reference.
-
Make sure your firewall allows traffic on port 11434 and that Docker network policies are not blocking it.
-
If you are running an older RAGFlow version, upgrade to at least v0.20.5 or the latest nightly, as recent versions fixed major Ollama integration bugs reference.
-
For resource issues (e.g., model hangs or high GPU usage), set OLLAMA_NUM_GPU=1 and OLLAMA_KEEP_ALIVE=-1 in your environment reference.
If you’re still stuck, please share the exact error message or log you see in RAGFlow when trying to use the local Ollama model, your deployment setup (Docker/native for both), and your current RAGFlow Ollama config (host, port, model name, API key field). This will help pinpoint the issue.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@Magicbook1108 please help me.
Furthermore, parsing the embedded model files deployed locally in Ollama works fine, so why is only the chat model having problems?
This usually happens because RAGFlow uses different code paths and configuration for chat models versus embedding or model listing. Chat models are more sensitive to configuration mismatches—like the exact model name (must match ollama list), the presence of an API key (even a dummy value), and the formatting of the base URL (no trailing slash, and must use http://host.docker.internal:11434 if RAGFlow is in Docker and Ollama is on the host). There was also a regression in RAGFlow v0.20.4 that broke chat model connections while embedding still worked; this was fixed in later versions and the nightly build. If embedding works but chat fails, double-check these settings and try upgrading RAGFlow if you’re not on the latest version. More details and fixes are discussed here and here.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@Magicbook1108 I had no problems using version 0.20.0, and I was using the same Ollama local model.
@Magicbook1108 But the new version still has problems.
@lll-Dragon Oh, I thought you were saying 0.22.0, I will take a look at this.
@Magicbook1108 I just downloaded the qwen3 local model from ollama again, and it works fine, but the model I downloaded before still has problems.
@Magicbook1108 As shown in the image, quen3 is what I just downloaded, while qwq and deepseek were what I downloaded previously.
@lll-Dragon
-
The issue: the “Searching” state never ends. This likely happens because the selected model (qwen-32b) is too large. It consumes a huge amount of CPU, causing the system to get stuck. You can verify this by opening a CLI and running top to check CPU usage during the chat.
-
The repeated content issue is most likely not caused by RagFlow. It usually comes from your KB data and how the model generates text. This can happen if your KB contains repeated structures (like tables), or if the model falls into a generation loop with similar text. RagFlow only handles retrieval and context injection, so the repetition mainly comes from the KB or the model itself.
You can try using smaller models such as 8B or 4B to verify that RagFlow is functioning correctly.
我也是这种,我用本地模型跑,他就乱答,这是什么原因呢,在线的都可以回答的准确
求大神帮助下
我用小的模型也不行
This behavior isn’t really RagFlow-specific. RagFlow just formats the prompt into messages and forwards them to the model, and we don’t treat Ollama models differently from other backends. In my experience, Ollama models are more prone to repetitive outputs – e.g., a table cell like 71: p element can lead to an answer that repeats “p element” 71 times.
This is a PDF in the format of an academic paper.
But when I run the model locally with ollama run, it works fine.
It gives random answers, not repetitive ones. The answers it provides are incorrect—it simply keeps repeating the question.
Can you answer?
@lll-Dragon Did the author resolve the issue?
@StupidWyh The problem remains unresolved; we can only use online models now.
@lll-Dragon Please try updating your Ollama version. A similar issue in #11301 was fixed after upgrading Ollama. It’s probably not a RagFlow problem, since me and my colleagues rarely encounter this when using Ollama with RagFlow.
I upgraded Ollama to version 0.13.0, but the problem persists.
@Magicbook1108 I tried three LLM models: one was the online SILICONFLOW-provided DeepSeek-R1-Distill-Qwen-32B, which functioned normally. Another was the locally deployed Deepseek-R1:32B using sglang, which only showed the start-of-thinking marker but not the end-of-thinking marker. The third was a locally deployed DeepSeek-V3-0324-INT4 (which inherently lacks a thinking process), and the response time for receiving answers was particularly long, although its response speed was fast when used independently. Do you have any ideas on how to resolve this issue?
@theRainight We’ll need to look into the code path to understand why this behavior occurs.