dify Knowledge base retrieval may unexpectedly take a significant amount of time.

Self Checks

[x] I have read the Contributing Guide and Language Policy.
[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report, otherwise it will be closed.
[x] 【中文用户 & Non English User】请使用英语提交，否则会被关闭：）
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.11.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

During knowledge base retrieval, if the request sent to the vector model fails, the system will wait for several hundred seconds.

✔️ Expected Behavior

If the request fails, it should terminate immediately.

❌ Actual Behavior

Dec 26 '25 22:12 liujia0422

Hi @liujia0422! I'm Dosu and I’m helping the dify team.

The long delay you’re seeing when a vector model request fails during knowledge base retrieval is due to how Dify handles retrieval tasks internally. The retrieval logic uses thread pools and waits for all retrieval threads to finish or hit a fixed timeout (typically 300 seconds per retrieval batch) before raising any errors. If a vector model request fails quickly (like a 502 error), the system still waits for the full timeout period before reporting the failure, which matches the ~302 second delays in your logs. There’s currently no mechanism to terminate all retrieval tasks immediately on the first failure—errors are only raised after all threads complete or timeout, causing the hang you observed. This behavior is set in the retrieval service code and is not configurable via environment variables or settings right now.

If you want to reduce the impact, you can try lowering the relevant timeout values in your deployment (for example, QDRANT_CLIENT_TIMEOUT for Qdrant requests), but this won’t affect the higher-level thread pool wait. The best workaround is to ensure your vector database and embedding provider are correctly configured and healthy, as misconfigurations can cause these silent failures and long waits. Improving error propagation and early cancellation would require a code change in the retrieval service logic.

References:

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 26 '25 22:12 dosubot[bot]

There will always be request failures caused by network fluctuations or other unexpected incidents, and the time cost of a single failure is currently too high. We hope to optimize this.

Dec 26 '25 22:12 liujia0422

The long delays you’re seeing are caused by how Dify’s retrieval logic waits for all retrieval threads to finish or hit a fixed timeout (usually 300 seconds) before surfacing any errors—even if a vector model request fails immediately. This means a single network glitch or backend failure can block the whole retrieval for several minutes, as there’s no mechanism to cancel or terminate other tasks early on error.

To optimize this, you’d need to change the retrieval service and DatasetRetrieval logic to propagate exceptions as soon as any thread fails. For example, you could use concurrent.futures.as_completed to inspect each future as it finishes, and if any raises an exception, cancel the remaining futures or signal other threads to stop. Introducing a shared cancellation flag or event, and checking it in each worker, would allow the system to abort quickly on the first failure. This would significantly reduce the time cost of a single failure and make the system more resilient to network issues and backend errors.

Currently, neither the retrieval service nor DatasetRetrieval implements this kind of early termination or cancellation logic—they only collect exceptions and raise them after all threads complete or timeout, which is why you see the long waits. Improving this would require code changes in both classes to support early error propagation and task cancellation.
You can review the relevant code in RetrievalService and DatasetRetrieval.

If you want to contribute or patch this, focus on surfacing exceptions as soon as they occur and cancelling outstanding tasks in the thread pools. This will help Dify handle transient failures much more efficiently.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Dec 26 '25 22:12 dosubot[bot]

Hi @liujia0422, did you use the HYBRID_SEARCH mode? How is the performance of full_text_index_search? The HYBRID_SEARCH mode waits for both embedding_search and full_text_index_search to finish, or times out after 300 seconds. Is the performance of full_text_index_search very slow?

Dec 27 '25 15:12 Chenyl-Sai

Hi @liujia0422, did you use the mode? How is the performance of ? The mode waits for both and to finish, or times out after 300 seconds. Is the performance of very slow?HYBRID_SEARCH``full_text_index_search``HYBRID_SEARCH``embedding_search``full_text_index_search``full_text_index_search

我是用的通用分段

Dec 27 '25 15:12 liujia0422

我是用的通用分段

Dec 27 '25 15:12 liujia0422

Not the Chunk Structure——I mean the Retrieval Setting. Did you use the Hybrid Search?

I think the search strategy could be changed to FIRST_EXCEPTION, so that the search can return early when any error occurs. Currently, it uses ALL_COMPLETED:

concurrent.futures.wait(futures, timeout=300, return_when=concurrent.futures.ALL_COMPLETED)

Dec 28 '25 01:12 Chenyl-Sai

这是我的知识库和工作流里的检索设置

Dec 28 '25 01:12 liujia0422

@Chenyl-Sai 我的检索设置和你的截图中的设置是一样的

Dec 28 '25 01:12 liujia0422

If you use Hybrid Search, there are several steps. First, embedding_search is used for semantic search, and full_text_index_search is used for full-text search. Then the system waits for both threads to finish.

I’d like to understand which step corresponds to the second picture you mentioned. Also, have you observed any performance issues with the other call?

Dec 28 '25 02:12 Chenyl-Sai