Knowledge base retrieval may unexpectedly take a significant amount of time.
Self Checks
- [x] I have read the Contributing Guide and Language Policy.
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.11.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
During knowledge base retrieval, if the request sent to the vector model fails, the system will wait for several hundred seconds.
✔️ Expected Behavior
If the request fails, it should terminate immediately.
❌ Actual Behavior
Hi @liujia0422! I'm Dosu and I’m helping the dify team.
The long delay you’re seeing when a vector model request fails during knowledge base retrieval is due to how Dify handles retrieval tasks internally. The retrieval logic uses thread pools and waits for all retrieval threads to finish or hit a fixed timeout (typically 300 seconds per retrieval batch) before raising any errors. If a vector model request fails quickly (like a 502 error), the system still waits for the full timeout period before reporting the failure, which matches the ~302 second delays in your logs. There’s currently no mechanism to terminate all retrieval tasks immediately on the first failure—errors are only raised after all threads complete or timeout, causing the hang you observed. This behavior is set in the retrieval service code and is not configurable via environment variables or settings right now.
If you want to reduce the impact, you can try lowering the relevant timeout values in your deployment (for example, QDRANT_CLIENT_TIMEOUT for Qdrant requests), but this won’t affect the higher-level thread pool wait. The best workaround is to ensure your vector database and embedding provider are correctly configured and healthy, as misconfigurations can cause these silent failures and long waits. Improving error propagation and early cancellation would require a code change in the retrieval service logic.
References:
- RetrievalService thread pool and timeout logic
- Qdrant vector store timeout config
- Docker config and troubleshooting tips
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
There will always be request failures caused by network fluctuations or other unexpected incidents, and the time cost of a single failure is currently too high. We hope to optimize this.
The long delays you’re seeing are caused by how Dify’s retrieval logic waits for all retrieval threads to finish or hit a fixed timeout (usually 300 seconds) before surfacing any errors—even if a vector model request fails immediately. This means a single network glitch or backend failure can block the whole retrieval for several minutes, as there’s no mechanism to cancel or terminate other tasks early on error.
To optimize this, you’d need to change the retrieval service and DatasetRetrieval logic to propagate exceptions as soon as any thread fails. For example, you could use concurrent.futures.as_completed to inspect each future as it finishes, and if any raises an exception, cancel the remaining futures or signal other threads to stop. Introducing a shared cancellation flag or event, and checking it in each worker, would allow the system to abort quickly on the first failure. This would significantly reduce the time cost of a single failure and make the system more resilient to network issues and backend errors.
Currently, neither the retrieval service nor DatasetRetrieval implements this kind of early termination or cancellation logic—they only collect exceptions and raise them after all threads complete or timeout, which is why you see the long waits. Improving this would require code changes in both classes to support early error propagation and task cancellation.
You can review the relevant code in RetrievalService and DatasetRetrieval.
If you want to contribute or patch this, focus on surfacing exceptions as soon as they occur and cancelling outstanding tasks in the thread pools. This will help Dify handle transient failures much more efficiently.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Hi @liujia0422, did you use the HYBRID_SEARCH mode?
How is the performance of full_text_index_search?
The HYBRID_SEARCH mode waits for both embedding_search and full_text_index_search to finish, or times out after 300 seconds.
Is the performance of full_text_index_search very slow?
Hi @liujia0422, did you use the mode? How is the performance of ? The mode waits for both and to finish, or times out after 300 seconds. Is the performance of very slow?
HYBRID_SEARCH``full_text_index_search``HYBRID_SEARCH``embedding_search``full_text_index_search``full_text_index_search
我是用的通用分段
我是用的通用分段
Not the Chunk Structure——I mean the Retrieval Setting. Did you use the Hybrid Search?
I think the search strategy could be changed to FIRST_EXCEPTION, so that the search can return early when any error occurs.
Currently, it uses ALL_COMPLETED:
concurrent.futures.wait(futures, timeout=300, return_when=concurrent.futures.ALL_COMPLETED)
@Chenyl-Sai 我的检索设置和你的截图中的设置是一样的
If you use Hybrid Search, there are several steps. First, embedding_search is used for semantic search, and full_text_index_search is used for full-text search. Then the system waits for both threads to finish.
I’d like to understand which step corresponds to the second picture you mentioned. Also, have you observed any performance issues with the other call?
如果您使用混合搜索,有几个步骤。首先,
embedding_search用于语义搜索,以及full_text_index_search用于全文搜索。然后,系统等待两个线程完成。我想知道你提到的第二张图片对应的是哪个步骤。 另外,您是否观察到其他呼叫存在任何性能问题?
第二张照片是在工作流中使用知识库节点接入第一张图片里的知识库后,在工作流的知识库节点中设置的召回设置。我有两个工作流,都遇到过工作流长时间等待知识库召回的问题
Sorry, I mean this picture
Sorry, I mean this picture
![]()
这张图片是我从模型供应商控制台,找到的向量模型请求响应失败的图片。我排查发现一旦向量模型请求没有成功,知识库的检索就要等待300秒左右才会结束
@fatelei Why not try requesting a retry? I think retrying once is better than exiting directly.
@fatelei Why not try requesting a retry? I think retrying once is better than exiting directly.
retry depends on manual retry