dify icon indicating copy to clipboard operation
dify copied to clipboard

Knowledge base's top K should be set per app

Open hxt365 opened this issue 7 months ago • 2 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Each knowledge base currently has a single top K parameter. When the same knowledge base is used across multiple apps, we may want to configure different top K values to retrieve a varying number of knowledge chunks based on each app's specific needs. However, since top K is defined at the knowledge base level, adjusting it for one app inadvertently impacts all other apps using that knowledge base.

2. Additional context or comments

No response

3. Can you help us with this feature?

  • [ ] I am interested in contributing to this feature.

hxt365 avatar May 19 '25 12:05 hxt365

I think you can set a rather big value to the knowledge base itself. And use a different value in the knowledge retrieval node to suit your needs.

crazywoola avatar May 20 '25 02:05 crazywoola

@crazywoola Unfortunately it does not work like that, the Retrieval setting in the node has no effects

hxt365 avatar May 20 '25 03:05 hxt365

@crazywoola It not the same thing to tell the KB to extract TOPK 10 and then in the node, ask to extract the top 3 of those 10 nodes.

  1. it consumes more token because you extract way too many document that have to be reranked
  2. you consume extra tokens again because you rerank 10 documents that were already reranked
  3. extracting 3 documents from the KB is different from extracting 3 document from the top 10 extracted, especially when using a parent-child approach where evaluating with a TOPK of 10 means evaluating more chunks that are bound to the same parent You can try it yourself:
  • create a parent child with full-doc indexing method
  • execute a query using the test tool with a TOPK of 3 and then a TOPK of 10 and observe the score given to the extracted documents, they may even change order: TOPK 3 -> A, B, C TOPK10 -> A, C, D, B, E, F, G, H, I, J

The Search parameters should be even more specific, A.K.A. per KB, per node because i might want to use an Hybrid Search with a a TOPK10 on KB X, a Vector Search with TOPK 3 on KB Y and on the result of those, use a semantic search.

Later in the same workflow i might want to do something different. Hence the Retrieval config should be per KB per node and stored inside the workflow DSL

I suppose this is why we have a search configuration in a popup window. It does not make sense that the popup overwrites the configuration of the KB for everyone / every other app

DavideDelbianco avatar May 22 '25 22:05 DavideDelbianco

what if create different knowledge base for different app using the same document, though it may cause too much knowledge base

EndlessSeeker avatar Jun 04 '25 09:06 EndlessSeeker

@EndlessSeeker that is prohibitively expensive, while this issue can be resolved in simple way

hxt365 avatar Jun 04 '25 09:06 hxt365

Hi, @hxt365. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You requested the ability to set the "top K" retrieval parameter individually per app instead of globally at the knowledge base level.
  • Current global setting causes inefficiencies and inconsistent results when multiple apps share the same knowledge base.
  • Other users suggested workarounds like separate knowledge bases per app, but you noted this is costly and not ideal.
  • The core need is more granular retrieval configuration to support diverse app requirements sharing a single knowledge base.
  • The issue remains unresolved with no implemented solution yet.

Next Steps:

  • Please let me know if this feature request is still relevant to your use case with the latest version of Dify by commenting on this issue.
  • If I don’t hear back within 15 days, I will automatically close this issue to keep the backlog manageable.

Thanks for your understanding and contribution!

dosubot[bot] avatar Aug 29 '25 16:08 dosubot[bot]