[Bug]: Persist layers hierarchy in RAPTOR
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
94181a990b957ed302952b4de17583d2b44f3099
RAGFlow image version
v0.18.0
Other environment information
Problem Statement
* In rag/raptor.py, the code initializes and appends to a local layers list to record each abstraction layer’s start/end indices.
* However, this layers list is never returned, stored on the instance, or exposed to downstream processes.
* RAGFlow instead flattens all summaries and original chunks into a single list—discarding any notion of tree structure
Impact
Without persisting layers, RAGFlow cannot:
* Reconstruct or traverse the abstraction tree at query time, negating RAPTOR’s tree‑organized retrieval mechanism.
* Debug or visualize which layer produced a given summary, severely limiting interpretability and troubleshooting
Actual behavior
-
In
rag/raptor.py, the locallayerslist is initialized and appended to during each clustering step, but it is never attached to the instance or returned by the__call__method -
The method concludes with
return chunks, so only the flattened list of original and summary chunks is exposed; the hierarchical(start, end)indices inlayersare discarded -
Consequently, downstream code and the retrieval pipeline have no access to the abstraction tree, making true tree-organized retrieval impossible.
Expected behavior
return chunks, layers
By preserving the full layers hierarchy, RAGFlow would unlock the core benefits of tree‑organized retrieval—enabling precise, multi‑level reasoning over long documents as envisioned by the RAPTOR authors.
Steps to reproduce
call await RAPTOR().__call__(chunks, random_state) on any non‐trivial list of chunks—note it returns only the flattened chunks list
Attempt to access raptor_instance.layers and observe an AttributeError (no layers persisted)
Additional information
No response
Hello @vishal-zetta! I'm Dosu, your friendly bot here to help you with bugs, answer questions, and assist you on your journey as a contributor. I'm here while you wait for a human maintainer. Let's tackle this issue together!
To address the issue of persisting the layers hierarchy in the RAPTOR class, you can modify the __call__ method in rag/raptor.py to include the layers list in its return statement. This will allow downstream processes to access the hierarchical structure of the chunks, enabling tree-organized retrieval and improving interpretability.
Here's a suggested approach:
-
Modify the
__call__Method: Update the method to return bothchunksandlayers. This change will ensure that the hierarchical structure is preserved and accessible.# Example modification in rag/raptor.py async def __call__(self, chunks, random_state): # ... existing code ... # Return both chunks and layers return chunks, layers -
Update Downstream Code: Ensure that any code calling the
__call__method is updated to handle the additionallayersoutput. This might involve updating function signatures and processing logic to accommodate the new return value.
By implementing these changes, you will enable RAGFlow to maintain and utilize the abstraction tree, enhancing its retrieval capabilities and interpretability.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Hi @KevinHuSh, any idea ?
If the tree structure needs to be retrieved, we need to add fields in ES to demonstrate the connections between chunks. Plus, we need to alter the search procedure to utilize the structure information of chunks.
Hi, I was just about to submit the same issue. Is updating this on the roadmap?