dify icon indicating copy to clipboard operation
dify copied to clipboard

Knowledge base retrieval may unexpectedly take a significant amount of time.

Open liujia0422 opened this issue 1 week ago • 3 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.11.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

During knowledge base retrieval, if the request sent to the vector model fails, the system will wait for several hundred seconds.

✔️ Expected Behavior

If the request fails, it should terminate immediately.

❌ Actual Behavior

Image Image

liujia0422 avatar Dec 26 '25 22:12 liujia0422

Hi @liujia0422! I'm Dosu and I’m helping the dify team.

The long delay you’re seeing when a vector model request fails during knowledge base retrieval is due to how Dify handles retrieval tasks internally. The retrieval logic uses thread pools and waits for all retrieval threads to finish or hit a fixed timeout (typically 300 seconds per retrieval batch) before raising any errors. If a vector model request fails quickly (like a 502 error), the system still waits for the full timeout period before reporting the failure, which matches the ~302 second delays in your logs. There’s currently no mechanism to terminate all retrieval tasks immediately on the first failure—errors are only raised after all threads complete or timeout, causing the hang you observed. This behavior is set in the retrieval service code and is not configurable via environment variables or settings right now.

If you want to reduce the impact, you can try lowering the relevant timeout values in your deployment (for example, QDRANT_CLIENT_TIMEOUT for Qdrant requests), but this won’t affect the higher-level thread pool wait. The best workaround is to ensure your vector database and embedding provider are correctly configured and healthy, as misconfigurations can cause these silent failures and long waits. Improving error propagation and early cancellation would require a code change in the retrieval service logic.

References:

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 26 '25 22:12 dosubot[bot]

There will always be request failures caused by network fluctuations or other unexpected incidents, and the time cost of a single failure is currently too high. We hope to optimize this.

liujia0422 avatar Dec 26 '25 22:12 liujia0422

The long delays you’re seeing are caused by how Dify’s retrieval logic waits for all retrieval threads to finish or hit a fixed timeout (usually 300 seconds) before surfacing any errors—even if a vector model request fails immediately. This means a single network glitch or backend failure can block the whole retrieval for several minutes, as there’s no mechanism to cancel or terminate other tasks early on error.

To optimize this, you’d need to change the retrieval service and DatasetRetrieval logic to propagate exceptions as soon as any thread fails. For example, you could use concurrent.futures.as_completed to inspect each future as it finishes, and if any raises an exception, cancel the remaining futures or signal other threads to stop. Introducing a shared cancellation flag or event, and checking it in each worker, would allow the system to abort quickly on the first failure. This would significantly reduce the time cost of a single failure and make the system more resilient to network issues and backend errors.

Currently, neither the retrieval service nor DatasetRetrieval implements this kind of early termination or cancellation logic—they only collect exceptions and raise them after all threads complete or timeout, which is why you see the long waits. Improving this would require code changes in both classes to support early error propagation and task cancellation.
You can review the relevant code in RetrievalService and DatasetRetrieval.

If you want to contribute or patch this, focus on surfacing exceptions as soon as they occur and cancelling outstanding tasks in the thread pools. This will help Dify handle transient failures much more efficiently.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 26 '25 22:12 dosubot[bot]

Hi @liujia0422, did you use the HYBRID_SEARCH mode? How is the performance of full_text_index_search? The HYBRID_SEARCH mode waits for both embedding_search and full_text_index_search to finish, or times out after 300 seconds. Is the performance of full_text_index_search very slow?

Chenyl-Sai avatar Dec 27 '25 15:12 Chenyl-Sai

Hi @liujia0422, did you use the mode? How is the performance of ? The mode waits for both and to finish, or times out after 300 seconds. Is the performance of very slow?HYBRID_SEARCH``full_text_index_search``HYBRID_SEARCH``embedding_search``full_text_index_search``full_text_index_search

我是用的通用分段

liujia0422 avatar Dec 27 '25 15:12 liujia0422

我是用的通用分段

liujia0422 avatar Dec 27 '25 15:12 liujia0422

Image

Not the Chunk Structure——I mean the Retrieval Setting. Did you use the Hybrid Search?

I think the search strategy could be changed to FIRST_EXCEPTION, so that the search can return early when any error occurs. Currently, it uses ALL_COMPLETED:

concurrent.futures.wait(futures, timeout=300, return_when=concurrent.futures.ALL_COMPLETED)

Chenyl-Sai avatar Dec 28 '25 01:12 Chenyl-Sai

Image Image 这是我的知识库和工作流里的检索设置

liujia0422 avatar Dec 28 '25 01:12 liujia0422

@Chenyl-Sai 我的检索设置和你的截图中的设置是一样的

liujia0422 avatar Dec 28 '25 01:12 liujia0422

If you use Hybrid Search, there are several steps. First, embedding_search is used for semantic search, and full_text_index_search is used for full-text search. Then the system waits for both threads to finish.

I’d like to understand which step corresponds to the second picture you mentioned. Also, have you observed any performance issues with the other call?

Chenyl-Sai avatar Dec 28 '25 02:12 Chenyl-Sai

如果您使用混合搜索,有几个步骤。首先, embedding_search 用于语义搜索,以及 full_text_index_search 用于全文搜索。然后,系统等待两个线程完成。

我想知道你提到的第二张图片对应的是哪个步骤。 另外,您是否观察到其他呼叫存在任何性能问题?

第二张照片是在工作流中使用知识库节点接入第一张图片里的知识库后,在工作流的知识库节点中设置的召回设置。我有两个工作流,都遇到过工作流长时间等待知识库召回的问题

liujia0422 avatar Dec 28 '25 02:12 liujia0422

Sorry, I mean this picture

Image

Chenyl-Sai avatar Dec 28 '25 02:12 Chenyl-Sai

Sorry, I mean this picture

Image

这张图片是我从模型供应商控制台,找到的向量模型请求响应失败的图片。我排查发现一旦向量模型请求没有成功,知识库的检索就要等待300秒左右才会结束

liujia0422 avatar Dec 28 '25 02:12 liujia0422

@fatelei Why not try requesting a retry? I think retrying once is better than exiting directly.

liujia0422 avatar Dec 28 '25 19:12 liujia0422

@fatelei Why not try requesting a retry? I think retrying once is better than exiting directly.

retry depends on manual retry

fatelei avatar Dec 29 '25 02:12 fatelei