kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[REQUEST] Improve RAG performance by Contextual Retrieval

Open rsnk96 opened this issue 10 months ago • 2 comments

Reference Issues

No response

Summary

Dear Kotaemon team

Great work so far. It would be nice if we could enhance retrieval performance be prefixing context in the document. Recent work by Anthropic shows notable gains by doing so. This appears to be relatively simple to implement (link shared below)

Image

Basic Example

Suppose the document is of a SEC filing,

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

Image

Drawbacks

While uploading a file into the rag database, it will take more time now (as the context has to be estimated for all chunks)

Additional information

Reference links:

  • Blog Post: https://www.anthropic.com/news/contextual-retrieval
  • Reference implementation by Anthropic: https://github.com/anthropics/anthropic-cookbook/blob/main/skills/contextual-embeddings/guide.ipynb

rsnk96 avatar Feb 28 '25 20:02 rsnk96

Adding a follow up note - It might make sense to add options both for document-level context and chunk-level context. Reason: If a user is operating on a specialized database (Ex: documents stored are only of one type, ex: financial SEC filings), chunk-level context may be an overkill.

Reference: https://x.com/rajhans_samdani/status/1899969389228937273?s=19

rsnk96 avatar Mar 13 '25 17:03 rsnk96

Noted in the to-do list.

taprosoft avatar Mar 31 '25 03:03 taprosoft