ragflow [Bug]: Intermittent Output Delays and Premature Truncation in Local Knowledge Base Setup`

Is there an existing issue for the same bug?

[x] I have checked the existing issues.

RAGFlow workspace code commit ID

main

RAGFlow image version

latest

Other environment information

model: qwa-32b q4

ollama + ragflow ==> local assisant.

Actual behavior

1. Erratic Token Output with Frequent Pauses After setting up the knowledge base locally, the model's output exhibits inconsistent latency. During generation, there are frequent pauses of 3-4 seconds between token bursts, after which large chunks (e.g., 20–50 tokens) are released abruptly. This creates a "stuttering" effect in responses, disrupting real-time readability and user interaction.

2. Premature Truncation at Response Endings When the model approaches the end of its response (~last 1-2 sentences), the output is often cut off mid-sentence, leaving incomplete phrases or grammatically broken statements (e.g., "The conclusion would be..." with no further completion). This truncation appears unrelated to token limits and occurs even when configured for longer maximum outputs.

examples:

|

These issues would not exist if use the OLLAMA directly by ollama run qwq:latest, and cherryStudio too.

Expected behavior

No response

Steps to reproduce

see the environment setup and actual behaviour.

Additional information

No response

Mar 09 '25 16:03 gouhongshen

I've also encountered this problem

Mar 10 '25 02:03 leirwei520

Which version did you use? Do you set up a knowledge base for an assistant?

Mar 10 '25 04:03 KevinHuSh

Which version did you use? Do you set up a knowledge base for an assistant?

I didn't choose a knowledge base

Mar 10 '25 07:03 leirwei520

It's been fixed. Pull the nightly version of docker image.

Mar 11 '25 04:03 KevinHuSh

It's been fixed. Pull the nightly version of docker image.

ok,ths

Mar 11 '25 12:03 leirwei520