[Bug]: Intermittent Output Delays and Premature Truncation in Local Knowledge Base Setup`
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
RAGFlow workspace code commit ID
main
RAGFlow image version
latest
Other environment information
model: qwa-32b q4
ollama + ragflow ==> local assisant.
Actual behavior
1. Erratic Token Output with Frequent Pauses After setting up the knowledge base locally, the model's output exhibits inconsistent latency. During generation, there are frequent pauses of 3-4 seconds between token bursts, after which large chunks (e.g., 20–50 tokens) are released abruptly. This creates a "stuttering" effect in responses, disrupting real-time readability and user interaction.
2. Premature Truncation at Response Endings When the model approaches the end of its response (~last 1-2 sentences), the output is often cut off mid-sentence, leaving incomplete phrases or grammatically broken statements (e.g., "The conclusion would be..." with no further completion). This truncation appears unrelated to token limits and occurs even when configured for longer maximum outputs.
examples:
|
These issues would not exist if use the OLLAMA directly by ollama run qwq:latest, and cherryStudio too.
Expected behavior
No response
Steps to reproduce
see the environment setup and actual behaviour.
Additional information
No response
I've also encountered this problem
Which version did you use? Do you set up a knowledge base for an assistant?
Which version did you use? Do you set up a knowledge base for an assistant?
I didn't choose a knowledge base
It's been fixed.
Pull the nightly version of docker image.
It's been fixed. Pull the
nightlyversion of docker image.
ok,ths