ragflow
ragflow copied to clipboard
[Bug]: Retrieval module truncates input prompts to 2048 tokens when using Ollama as Inference Provider
Description: When using Ollama as the inference service provider (tested with eepseek r1:14b) in RAGFlow v0.17.2, I've observed unexpected truncation of input prompts to only the last 2048 tokens, despite the following configurations:
- Verified model capacity: The same model through Ollama CLI directly accepts >2k context windows
- Provider settings: Configured
Maximum Tokens Limitto 16384 in Model Provider settings - Hardware confirmation: Adequate system resources available
Affected Components:
- Retrieval Module
- Ollama Model Provider integration
Steps to Reproduce:
- Configure Ollama as model provider with any 4k+ context model (e.g. deepseek r1:14b)
- Set "Maximum Tokens Limit" to 16384 in provider settings
- Start a chat which specified llm deepseek r1:14b@Ollama
- Get wrong answer suggesting that input prompts is truncated unexpectedly.(It's like the LLM only got last 2048 tokens prompts, without system prompts and high rank related knowledge base fragments.)
- Examine LLM input handling ability by directly copied the content ( system prompts and high related rank knowledge base fragments.) into Ollama CLI, and got right anwser.
Expected Result: Full system prompt and knowledge base fragments are taken by the model, then give right answer as usually it is when using SiliconFlow API.
Actual Result: LLM give wrong answer, suggesting that it did not take whole prompts in.
Additional Observations:
- Third-party providers like SiliconFlow (硅基流动) work as expected with same settings
- Issue persists across different model types( llama, qwen, deepseek-qwen-disitlled)
Environment:
- RAGFlow Version: 0.17.2
- Ollama Version: 0.1.27
- Model Tested: llama qwen2.5 deepseek-qwen-disitlled
- OS: [Windows 11]
- Hardware: [i7 13700KF 32GB RTX4080]