In the parent-child RAG model, can the top-k be used to truncate based on the number of recalled parent segments?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
1. Is this request related to a challenge you're experiencing? Tell me about your story.
In the parent-child RAG model, could the Top K only truncate parent chunk ?
This is because when a child chunk hits, if most of the hits are within the same parent chunk, only one parent chunk will be returned in the end. As a result, other relevant parent chunks cannot be included.
2. Additional context or comments
No response
3. Can you help us with this feature?
- [ ] I am interested in contributing to this feature.
Hi, @rainsoft. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- The issue concerns the Top-K method in the parent-child RAG model.
- The main concern is the potential exclusion of relevant parent chunks if child chunk hits are concentrated within a single parent chunk.
- You mentioned not being interested in contributing to this feature's development.
- No further comments or developments have been made on this issue.
Next Steps:
- Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting here.
- If there are no updates, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!
Thank you for the reminder. This issue remains relevant to the RAG model's optimization in Dify. While I'm not currently contributing to its development, the core concern about parent-child chunk exclusion in Top-K methods persists in scenarios where child chunks cluster within a single parent. This could impact retrieval accuracy in hierarchical document structures.
The Dify team has been actively improving retrieval configurations (e.g., hybrid search weight adjustments) and dataset segmentation logic, but this specific edge case hasn’t been fully addressed. I recommend keeping it open for community discussion or future roadmap consideration.
Please maintain this issue’s status to allow further technical exploration.
@crazywoola, the user @rainsoft has indicated that the issue regarding the Top-K method in the parent-child RAG model remains relevant, particularly concerning the exclusion of parent chunks impacting retrieval accuracy. Please assist them with this issue to facilitate further technical exploration.