[Bug]: Retrieval module truncates input prompts to 2048 tokens when using Ollama as Inference Provider

Open AlexRice13 opened this issue 9 months ago • 1 comments

Description: When using Ollama as the inference service provider (tested with eepseek r1:14b) in RAGFlow v0.17.2, I've observed unexpected truncation of input prompts to only the last 2048 tokens, despite the following configurations:

Verified model capacity: The same model through Ollama CLI directly accepts >2k context windows
Provider settings: Configured Maximum Tokens Limit to 16384 in Model Provider settings
Hardware confirmation: Adequate system resources available

Affected Components:

Retrieval Module
Ollama Model Provider integration

Steps to Reproduce:

Configure Ollama as model provider with any 4k+ context model (e.g. deepseek r1:14b)
Set "Maximum Tokens Limit" to 16384 in provider settings
Start a chat which specified llm deepseek r1:14b@Ollama
Get wrong answer suggesting that input prompts is truncated unexpectedly.(It's like the LLM only got last 2048 tokens prompts, without system prompts and high rank related knowledge base fragments.)
Examine LLM input handling ability by directly copied the content ( system prompts and high related rank knowledge base fragments.) into Ollama CLI, and got right anwser.

Expected Result: Full system prompt and knowledge base fragments are taken by the model, then give right answer as usually it is when using SiliconFlow API.

Actual Result: LLM give wrong answer, suggesting that it did not take whole prompts in.

Additional Observations:

Third-party providers like SiliconFlow (硅基流动) work as expected with same settings
Issue persists across different model types( llama, qwen, deepseek-qwen-disitlled)

Environment:

RAGFlow Version: 0.17.2
Ollama Version: 0.1.27
Model Tested: llama qwen2.5 deepseek-qwen-disitlled
OS: [Windows 11]
Hardware: [i7 13700KF 32GB RTX4080]

Mar 17 '25 08:03 AlexRice13