ragflow [Bug]: Locally deployed models in RAGFlow cannot be used

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

buqingchu

RAGFlow image version

V0.21.0

Other environment information

My local model is qwen2:7b.
The RAGFlow version is infiniflow/ragflow v0.21.0.
The Ollama version is 0.12.11.
Running the model with ollama run works fine!

Actual behavior

The model gives incorrect answers, repeats the questions, or keeps repeating responses.

Expected behavior

No response

Steps to reproduce

Using the model pulled by Ollama, but the model gives incorrect answers.

Additional information

No response

Nov 21 '25 11:11 StupidWyh

This issue is likely related to how RAGFlow formats prompts and parses responses when integrating with Ollama-based local models like qwen2:7b. The integration uses a shared pipeline that expects OpenAI-style message schemas and may not always handle model-specific quirks, which can result in repeated or incorrect outputs if the response isn't parsed as expected.

Double-check that your Ollama base URL and model name in RAGFlow exactly match your local Ollama setup. For qwen2:7b, if you're using IPEX-LLM acceleration, make sure all environment variables and service initialization steps are followed as described in the RAGFlow local LLM deployment guide. Also, verify that RAGFlow can reach your Ollama instance from its environment (especially if running in Docker, use http://host.docker.internal:11434 as the base URL) and that the model type is set to "chat" in the provider settings. See the detailed deployment and troubleshooting steps in the official guide for more context and configuration checks here.

If everything is configured correctly and the issue persists, it may be due to how RAGFlow's prompt/response handling interacts with the qwen2:7b model via Ollama. This can sometimes cause the model to echo prompts or generate repetitive text if the message array or response parsing doesn't align with the model's expectations see code reference.

If you have logs or can share your exact RAGFlow Ollama provider configuration, that could help pinpoint the root cause. Otherwise, reviewing the guide and ensuring strict alignment between your Ollama and RAGFlow settings is the best next step.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 21 '25 11:11 dosubot[bot]

Nov 21 '25 12:11 StupidWyh

就是这种，提问就是乱回答，要么就不回答，我使用在线的模型的时候，都是没问题的，一使用ollama部署的模型，就是这样的

Nov 21 '25 12:11 StupidWyh

就是这种乱回答，重复回答错误答案，我就用的测试模型都是很小的

Nov 21 '25 12:11 StupidWyh

What are the possible solutions?

Nov 21 '25 12:11 StupidWyh

This kind of incorrect responses and repeated wrong answers occur even when I use very small test models.

Nov 21 '25 12:11 StupidWyh

This is exactly the issue—when asking questions, it either gives random answers or doesn't respond at all. When I use online models, there are no problems, but as soon as I switch to models deployed with Ollama, this happens.

Nov 21 '25 12:11 StupidWyh

This usually happens because RAGFlow and Ollama integration is sensitive to configuration details and model compatibility. Online models work because their APIs and prompt/response formats are tightly controlled, but Ollama-hosted models (like qwen2:7b or llama3.2:latest) can behave unpredictably if there’s any mismatch in setup or expectations.

Here are the most common causes and solutions:

Make sure the model name in RAGFlow matches exactly what you see in ollama list (including tags like :latest) reference.
If RAGFlow runs in Docker and Ollama is on your host, use http://host.docker.internal:11434 as the base URL (not localhost or 127.0.0.1). On Linux, you may need to add extra_hosts in your Docker config reference.
Enter any value (even a dummy string) in the API key field in RAGFlow for Ollama models, as some versions require this even for local models reference.
Make sure Ollama is listening on all interfaces (0.0.0.0:11434). If not, set OLLAMA_HOST=0.0.0.0:11434 and restart Ollama reference.
If you see repeated or empty answers, it can be due to how the model interprets the prompt or how RAGFlow parses the response. This is more common with Ollama models and not usually a RAGFlow bug reference.
If you’re using IPEX-LLM acceleration (for qwen2), make sure all environment variables and initialization steps are set as described in the deployment guide.
Try updating to the latest RAGFlow version or the nightly Docker image, as recent versions fixed several Ollama integration bugs reference.
For debugging, set LOG_LEVEL=DEBUG to get more detailed logs.

If you’ve checked all of the above and still see issues, it’s likely due to the model’s own behavior with the prompt format RAGFlow uses. Ollama models are known to sometimes repeat or ignore prompts, especially with certain knowledge base structures or when the prompt format isn’t exactly what the model expects reference.

Let me know if you want more detailed steps for any of these checks.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 21 '25 12:11 dosubot[bot]

I can't even get a response from the model when I send a simple "Hello".

Nov 21 '25 12:11 StupidWyh

Why does calling the locally deployed large model Qwen3 32B (using Ollama) result in very slow responses or no response at all? And when it does respond, it often repeatedly outputs the same content, making the response unusable.

Nov 21 '25 12:11 StupidWyh

Can you provide a reliable solution? I have already connected to Ollama using the URL http://10.81.30.XX:11434, but the model basically gives random answers. I don't understand why.

Nov 21 '25 12:11 StupidWyh

Here’s a reliable checklist to resolve random, repeated, or no responses from Ollama models in RAGFlow:

Make sure Ollama is listening on all interfaces. Set OLLAMA_HOST=0.0.0.0 in your Ollama service config and restart Ollama so it’s accessible from other machines or containers (details).
In RAGFlow, the model name must exactly match what ollama list shows (including tags like :latest). The API key field should not be empty—enter any value if needed (details, details).
If RAGFlow runs in Docker and Ollama is on your host, use http://host.docker.internal:11434 as the base URL (not localhost or 127.0.0.1). On Linux, you may need to add extra_hosts in your Docker config (details).
For large models like Qwen3 32B, set OLLAMA_NUM_GPU=1 and OLLAMA_KEEP_ALIVE=-1 in your environment to reduce VRAM usage and avoid high CPU load (details, details).
Update Ollama to the latest version (at least 0.12.10) and RAGFlow to v0.20.5 or newer, as recent releases fixed major Ollama integration bugs (details, details).
If you see slow or truncated output, try the 'nightly' RAGFlow Docker image instead of 'latest' (details).
Avoid sending documents with images to large models unless needed, as this can trigger repeated API calls and unusable output (details).
If you still get unusable responses, consider hosting Qwen3 with vLLM instead of Ollama—users report much faster and more stable results (details, details).

If the problem persists after these steps, check your hardware specs (especially GPU memory and CPU cores), as Qwen3 32B is extremely resource-intensive. Setting LOG_LEVEL=DEBUG in RAGFlow can also help diagnose what’s happening under the hood.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 21 '25 12:11 dosubot[bot]

Nov 21 '25 14:11 StupidWyh

It either gets stuck or repeatedly outputs useless content.

Nov 21 '25 14:11 StupidWyh

The repeated “I don’t know” responses mean RAGFlow isn’t retrieving any relevant content from your knowledge base for the query “nanozyme是什么”. This usually happens for a few reasons:

The knowledge base (“hj222”) might not contain any information about “nanozyme”, or the documents haven’t been parsed or embedded correctly. Check that your knowledge base shows a nonzero chunk count and parsing is complete.
The embedding model used for the knowledge base must match the one set for retrieval/chat. If they differ, retrieval will silently fail.
If the similarity threshold (0.2) is too high, relevant results might be filtered out. Try lowering it or increasing Top N.
There’s a known bug where new document chunks can be missing an “available” flag, making them invisible to retrieval. Re-parse your documents or upgrade to the latest nightly Docker image and re-parse to fix this.
Check backend logs for errors during indexing or retrieval—sometimes failures aren’t shown in the UI.

If you confirm the knowledge base contains relevant info, embeddings are generated, and the models match, but you still get fallback answers, it’s almost always a retrieval/configuration issue, not a model bug. See more troubleshooting steps and explanations here: source source.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 21 '25 14:11 dosubot[bot]

The questions I asked can be answered correctly using the online model, but when using the locally deployed Ollama model, it either fails to answer or gives random, incorrect responses—even with the exact same questions.

Nov 21 '25 14:11 StupidWyh

22:44:36 - LiteLLM:INFO: utils.py:3296 -

LiteLLM completion() model= qwen-max; provider = dashscope

2025-11-21 22:44:36,791 INFO 21

LiteLLM completion() model= qwen-max; provider = dashscope

2025-11-21 22:44:41,525 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:44:41.524+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:44:42,789 INFO 21 172.18.0.6 - - [21/Nov/2025 22:44:42] "POST /v1/conversation/completion HTTP/1.1" 200 -

2025-11-21 22:44:45,535 INFO 21 172.18.0.6 - - [21/Nov/2025 22:44:45] "GET /v1/document/image/745d0a20c6af11f0a3e43ea0449eda9e-5b18b2db57131be5 HTTP/1.1" 200 -

2025-11-21 22:44:45,537 INFO 21 172.18.0.6 - - [21/Nov/2025 22:44:45] "GET /v1/document/image/745d0a20c6af11f0a3e43ea0449eda9e-43fcabeb9c7f87d9 HTTP/1.1" 200 -

2025-11-21 22:44:45,538 INFO 21 172.18.0.6 - - [21/Nov/2025 22:44:45] "GET /v1/document/image/745d0a20c6af11f0a3e43ea0449eda9e-95412d8f47115400 HTTP/1.1" 200 -

2025-11-21 22:44:45,731 INFO 21 172.18.0.6 - - [21/Nov/2025 22:44:45] "GET /v1/document/thumbnails?doc_ids=7f6e34acc6af11f085353ea0449eda9e HTTP/1.1" 200 -

2025-11-21 22:45:09,253 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:45:09.253+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:45:36,938 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:45:36.938+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:46:04,664 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:46:04.664+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:46:32,441 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:46:32.440+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:46:34,873 INFO 21 172.18.0.6 - - [21/Nov/2025 22:46:34] "GET /v1/user/info HTTP/1.1" 200 -

2025-11-21 22:46:34,875 INFO 21 172.18.0.6 - - [21/Nov/2025 22:46:34] "GET /v1/document/thumbnails?doc_ids=7f6e34acc6af11f085353ea0449eda9e HTTP/1.1" 200 -

2025-11-21 22:46:34,884 INFO 21 172.18.0.6 - - [21/Nov/2025 22:46:34] "POST /v1/kb/list HTTP/1.1" 200 -

2025-11-21 22:46:34,919 INFO 21 172.18.0.6 - - [21/Nov/2025 22:46:34] "GET /v1/llm/list HTTP/1.1" 200 -

2025-11-21 22:47:00,175 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:47:00.174+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:47:27,896 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:47:27.894+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

2025-11-21 22:47:55,595 INFO 31 task_executor_2b3912361ed5_0 reported heartbeat: {"name": "task_executor_2b3912361ed5_0", "now": "2025-11-21T22:47:55.594+08:00", "boot_at": "2025-11-21T15:47:42.727+08:00", "pending": 2, "lag": 0, "done": 1, "failed": 0, "current": {}}

Nov 21 '25 15:11 StupidWyh

This is the log, take a look.

Nov 21 '25 15:11 StupidWyh

The log you provided shows successful requests and no explicit errors or failures, which means RAGFlow and Ollama are communicating at the network/API level. However, issues like random, repeated, or empty responses from local Ollama models are usually caused by model configuration, resource constraints, or prompt formatting—not backend errors, so they don’t show up in logs as failures.

RAGFlow’s error handling focuses on network and server errors, retrying and logging only when exceptions occur. If the model returns an empty or repeated answer, it’s treated as a valid response and not logged as an error (source). This is why your logs look clean even when the output is unusable.

To troubleshoot and improve output quality, double-check these points:

Make sure the model name in RAGFlow matches exactly what ollama list shows (including tags like :latest).
Use a non-empty API key in RAGFlow for Ollama models, even if it’s a dummy string.
If running RAGFlow in Docker, use http://host.docker.internal:11434 as the base URL and ensure Ollama is listening on all interfaces (0.0.0.0:11434).
For large models, set OLLAMA_NUM_GPU=1 and OLLAMA_KEEP_ALIVE=-1 to reduce resource usage.
Update to the latest RAGFlow and Ollama versions, or try the nightly RAGFlow Docker image for recent bug fixes (source).
Set LOG_LEVEL=DEBUG for more detailed logs if you want to catch hidden errors.

If you still get unusable responses, consider switching to vLLM for hosting Qwen models—users report much better results (source).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Nov 21 '25 15:11 dosubot[bot]

No use

Nov 23 '25 09:11 StupidWyh

we are working on it.

Nov 24 '25 02:11 Magicbook1108

How can I upgrade the RAGFlow version?

Nov 24 '25 10:11 StupidWyh

Since v0.21, there have been issues with the local Ollama model.

Nov 24 '25 10:11 StupidWyh

@StupidWyh If you want to swtich stable releases, you can use git checkout, e.g. git checkout v0.19.0. Then delete previous docker images and compose again.

Nov 24 '25 11:11 Magicbook1108

You mentioned that version v0.19.0 is very stable. Do you have more detailed steps? Since the versions keep updating, will other code change as well? It's not just about replacing the RAGFlow image in the .env file, right?

Nov 24 '25 13:11 StupidWyh

Please follow the instructions in the README. Simply check out the version you want to use. Also, when I suggested deleting the Docker image, I actually meant stopping and removing the running containers in Docker.

Nov 26 '25 01:11 Magicbook1108