[Feature Request]: Feature Request: Hierarchical Retrieval Architecture for Production-Grade RAG
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Is your feature request related to a problem?
Describe the feature you'd like
Problem Statement: Bridging the "Demo-to-Production" Gap
RAGflow currently demonstrates strong performance in proof-of-concept (PoC) scenarios. However, when deployed in production environments with diverse knowledge bases and large-scale document collections (tens of thousands of documents), the existing "single-layer retrieval" architecture—which flattens all document chunks into a single vector search space—reveals significant limitations in both accuracy and efficiency.
Key Challenges:
-
Chunk Fragmentation Issues
- Context Fragmentation: Improper segmentation disrupts natural semantic units, resulting in incomplete information within individual chunks and degraded semantic representation.
- Information Dilution: Critical information ("gold nuggets") is often split across multiple chunks, making comprehensive retrieval challenging and reducing answer quality.
-
Embedding Model Limitations
- Theoretical Constraints: As established in research papers like "On the Theoretical Limitations of Embedding-Based Retrieval" the dimensionality of embedding vectors fundamentally limits the number of "document-query" relevance relationships that can be perfectly represented.
- Practical Bottlenecks: Commonly deployed private embedding models (e.g., qwen3-embedding-0.6B, jina-embeddings-v3 with 1024 dimensions) may lack sufficient capacity to encode complex semantic relationships at scale. While higher-dimensional models (4096/8196 dim) exist, they impose prohibitive hardware requirements and computational costs for private deployments.
- Retrieval Precision Degradation: Direct vector search across millions of chunks becomes computationally expensive and prone to vector space "crowding" and "confusion," causing relevant chunks to rank lower.
-
Underutilized Metadata
- Valuable document metadata (department, author, date, document type, etc.) remains largely untapped as systematic pre-retrieval filters, wasting crucial structured information.
Proposed Solution: Three-Tier Retrieval Architecture
Inspired by search engine hierarchical principles, we propose a Knowledge Base → Document → Chunk three-tier retrieval architecture to progressively narrow the search scope and enhance both precision and efficiency.
Tier 1: Knowledge Base Routing
- Function: Automatically routes user queries to the most relevant knowledge base based on intent.
- Implementation:
- Support independent retrieval parameters per knowledge base (vector/keyword weights, recall thresholds).
- Enable dynamic routing via rule-based or LLM-based approaches to ensure domain-specific processing.
Tier 2: Document Filtering
- Function: Applies document-level metadata filtering within selected knowledge bases to identify relevant document subsets.
- Enhancements:
- Intelligent Metadata Filtering: In Auto mode, allow users to specify key metadata fields (e.g., document type, department) with LLM-generated filter conditions to avoid high-cardinality metadata interference.
- Metadata Similarity Matching: Introduce similarity operators for text-based metadata (document names, summaries) to support fuzzy matching.
- Enhanced Metadata Generation: Strengthen Data Pipeline capabilities for full-text metadata and summary generation to enrich document filtering context.
- Efficient metadata management function: batch CRUD of metadata;metadata management UI.
Tier 3: Chunk Refinement
- Function: Performs precise vector retrieval at the chunk level within the filtered document set.
- Enhancements:
- Parent-Child Chunking with Summary Mapping: Enable creation of parent-level summaries for contextually related chunks. Retrieval first matches macro-themes via summary vectors, then maps to original chunks for details—combining semantic robustness with granular information access.
- Customizable Prompts: Allow users to configure custom prompts for chunk keyword extraction and question generation tasks to better align with domain-specific semantics.
Complementary Data Pipeline Enhancements
- Data Pipeline can work as a complementary enhancement to Build-in Methods, not only a replacement.
- Focus on strengthening full-text metadata generation and document-level summarization capabilities to provide robust data foundation for hierarchical retrieval.
Expected Benefits
Implementing this hierarchical retrieval architecture will enable RAGflow's critical transition from "feasible" to "production-ready":
- Improved Recall Precision: Layered filtering effectively focuses on relevant regions, reducing interference from irrelevant chunks and fundamentally addressing embedding model limitations.
- Optimized System Performance: Significantly reduces vector search candidate sets, lowering computational overhead and improving response latency.
- Enhanced System Intelligence & Flexibility: Knowledge base routing and intelligent metadata filtering enable better understanding of user intent and adaptation to complex production environments.
- Reduced Operational Costs: Template-based, batch-enabled metadata management tools minimize maintenance overhead.
Implementation Priority
High - This architecture addresses fundamental scalability and precision limitations critical for production deployments.
Describe implementation you've considered
No response
Documentation, adoption, use case
Additional information
No response
@ZhenhangTung Please assign the issue to me.
Hi, thanks for your suggestions! Regarding to the challenges mentioned above, we do have a series of plans to enhance them, and a tree based retrieval is a MUST for the enhancement. However, the detailed implementation might not be performed in this way, actually, it's not a kind of routing but a co-work between the indexing and retrieval. We will propose a draft design soon and let you know asap.
Okay, thanks for your reply. Can you check my other PRs and check my discord message?
On Thu, Dec 11, 2025 at 10:40 PM Yingfeng @.***> wrote:
yingfeng left a comment (infiniflow/ragflow#11610) https://github.com/infiniflow/ragflow/issues/11610#issuecomment-3644773455
Hi, thanks for your suggestions! Regarding to the challenges mentioned above, we do have a series of plans to enhance them, and a tree based retrieval is a MUST for the enhancement. However, the detailed implementation might not be performed in this way, actually, it's not a kind of routing but a co-work between the indexing and retrieval. We will propose a draft design soon and let you know asap.
— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/11610#issuecomment-3644773455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYRGSNS4GKMW5ME6ZQVV6L4BI2MPAVCNFSM6AAAAACNSTJTM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNBUG43TGNBVGU . You are receiving this because you commented.Message ID: @.***>