ragflow [Feature Request]: Feature Request: Hierarchical Retrieval Architecture for Production-Grade RAG

Self Checks

[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (Language Policy).
[x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
[x] Please do not modify this template :) and fill in all the required fields.

Is your feature request related to a problem?

Describe the feature you'd like

Problem Statement: Bridging the "Demo-to-Production" Gap

RAGflow currently demonstrates strong performance in proof-of-concept (PoC) scenarios. However, when deployed in production environments with diverse knowledge bases and large-scale document collections (tens of thousands of documents), the existing "single-layer retrieval" architecture—which flattens all document chunks into a single vector search space—reveals significant limitations in both accuracy and efficiency.

Key Challenges:

Chunk Fragmentation Issues
- Context Fragmentation: Improper segmentation disrupts natural semantic units, resulting in incomplete information within individual chunks and degraded semantic representation.
- Information Dilution: Critical information ("gold nuggets") is often split across multiple chunks, making comprehensive retrieval challenging and reducing answer quality.
Embedding Model Limitations
- Theoretical Constraints: As established in research papers like "On the Theoretical Limitations of Embedding-Based Retrieval" the dimensionality of embedding vectors fundamentally limits the number of "document-query" relevance relationships that can be perfectly represented.
- Practical Bottlenecks: Commonly deployed private embedding models (e.g., qwen3-embedding-0.6B, jina-embeddings-v3 with 1024 dimensions) may lack sufficient capacity to encode complex semantic relationships at scale. While higher-dimensional models (4096/8196 dim) exist, they impose prohibitive hardware requirements and computational costs for private deployments.
- Retrieval Precision Degradation: Direct vector search across millions of chunks becomes computationally expensive and prone to vector space "crowding" and "confusion," causing relevant chunks to rank lower.
Underutilized Metadata
- Valuable document metadata (department, author, date, document type, etc.) remains largely untapped as systematic pre-retrieval filters, wasting crucial structured information.

Proposed Solution: Three-Tier Retrieval Architecture

Inspired by search engine hierarchical principles, we propose a Knowledge Base → Document → Chunk three-tier retrieval architecture to progressively narrow the search scope and enhance both precision and efficiency.

Tier 1: Knowledge Base Routing

Function: Automatically routes user queries to the most relevant knowledge base based on intent.
Implementation:
- Support independent retrieval parameters per knowledge base (vector/keyword weights, recall thresholds).
- Enable dynamic routing via rule-based or LLM-based approaches to ensure domain-specific processing.

Tier 2: Document Filtering

Function: Applies document-level metadata filtering within selected knowledge bases to identify relevant document subsets.
Enhancements:
- Intelligent Metadata Filtering: In Auto mode, allow users to specify key metadata fields (e.g., document type, department) with LLM-generated filter conditions to avoid high-cardinality metadata interference.
- Metadata Similarity Matching: Introduce similarity operators for text-based metadata (document names, summaries) to support fuzzy matching.
- Enhanced Metadata Generation: Strengthen Data Pipeline capabilities for full-text metadata and summary generation to enrich document filtering context.
- Efficient metadata management function: batch CRUD of metadata；metadata management UI.

Tier 3: Chunk Refinement

Function: Performs precise vector retrieval at the chunk level within the filtered document set.
Enhancements:
- Parent-Child Chunking with Summary Mapping: Enable creation of parent-level summaries for contextually related chunks. Retrieval first matches macro-themes via summary vectors, then maps to original chunks for details—combining semantic robustness with granular information access.
- Customizable Prompts: Allow users to configure custom prompts for chunk keyword extraction and question generation tasks to better align with domain-specific semantics.

Complementary Data Pipeline Enhancements

Data Pipeline can work as a complementary enhancement to Build-in Methods, not only a replacement.
Focus on strengthening full-text metadata generation and document-level summarization capabilities to provide robust data foundation for hierarchical retrieval.

Expected Benefits

Implementing this hierarchical retrieval architecture will enable RAGflow's critical transition from "feasible" to "production-ready":

Improved Recall Precision: Layered filtering effectively focuses on relevant regions, reducing interference from irrelevant chunks and fundamentally addressing embedding model limitations.
Optimized System Performance: Significantly reduces vector search candidate sets, lowering computational overhead and improving response latency.
Enhanced System Intelligence & Flexibility: Knowledge base routing and intelligent metadata filtering enable better understanding of user intent and adaptation to complex production environments.
Reduced Operational Costs: Template-based, batch-enabled metadata management tools minimize maintenance overhead.

Implementation Priority

High - This architecture addresses fundamental scalability and precision limitations critical for production deployments.

Describe implementation you've considered

No response

Documentation, adoption, use case

Additional information

No response

Nov 30 '25 08:11 TeslaZY

@ZhenhangTung Please assign the issue to me.

Dec 03 '25 10:12 hsparks-codes

Hi, thanks for your suggestions! Regarding to the challenges mentioned above, we do have a series of plans to enhance them, and a tree based retrieval is a MUST for the enhancement. However, the detailed implementation might not be performed in this way, actually, it's not a kind of routing but a co-work between the indexing and retrieval. We will propose a draft design soon and let you know asap.

Dec 12 '25 03:12 yingfeng

Okay, thanks for your reply. Can you check my other PRs and check my discord message?

On Thu, Dec 11, 2025 at 10:40 PM Yingfeng @.***> wrote:

yingfeng left a comment (infiniflow/ragflow#11610) https://github.com/infiniflow/ragflow/issues/11610#issuecomment-3644773455

Hi, thanks for your suggestions! Regarding to the challenges mentioned above, we do have a series of plans to enhance them, and a tree based retrieval is a MUST for the enhancement. However, the detailed implementation might not be performed in this way, actually, it's not a kind of routing but a co-work between the indexing and retrieval. We will propose a draft design soon and let you know asap.

— Reply to this email directly, view it on GitHub https://github.com/infiniflow/ragflow/issues/11610#issuecomment-3644773455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHYRGSNS4GKMW5ME6ZQVV6L4BI2MPAVCNFSM6AAAAACNSTJTM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMNBUG43TGNBVGU . You are receiving this because you commented.Message ID: @.***>

Dec 12 '25 03:12 hsparks-codes