[Feature Request]: Improve RAG performance using Contextual Retrieval
Is there an existing issue for the same feature request?
- [x] I have checked the existing issues.
Is your feature request related to a problem?
No
Describe the feature you'd like
Dear Infinflow / Ragflow team
Great work so far. It would be nice if we could enhance retrieval performance be prefixing context in the document. Recent work by Anthropic shows notable gains by doing so. This appears to be relatively simple to implement (link shared below)
Basic Example
Suppose the document is of a SEC filing,
original_chunk = "The company's revenue grew by 3% over the previous quarter."
contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."
Describe implementation you've considered
No response
Documentation, adoption, use case
Additional information
Drawbacks
While uploading a file into the rag database, it will take more time now (as the context has to be estimated for all chunks)
Reference links:
- Blog Post: https://www.anthropic.com/news/contextual-retrieval
- Reference implementation by Anthropic: https://github.com/anthropics/anthropic-cookbook/blob/main/skills/contextual-embeddings/guide.ipynb
A follow up note - It might make sense to add options both for document-level context and chunk-level context. Reason: If a user is operating on a specialized database (Ex: documents stored are only of one type, ex: financial SEC filings), chunk-level context may be an overkill.
Reference: https://x.com/rajhans_samdani/status/1899969389228937273?s=19
Is this feature "Contextual Retrieval" already implemented in RagFlow as stated in https://ragflow.io/blog/a-deep-dive-into-ragflow-v0.15.0#semantic-gap ? I can't find it in the source code and no option related to this in the UI as well ...