anything-llm icon indicating copy to clipboard operation
anything-llm copied to clipboard

[FEAT]: Add Optional Small-to-Big Retrieval

Open cope opened this issue 1 year ago • 1 comments

What would you like to see?

Apparently, smaller chunk sizes improve retrieval quality, but larger chunk sizes improve generation quality: Advanced RAG 01: Small-to-Big Retrieval.

If the current embediing process stores the relative chunk ids per document, then when chunk i is retrieved, we can prepend chunks [i-2, i-1] and append chunks [i+1, i+2] and pass on that big combined text to the generation step. This would have both benefits: smaller chunks for retrieval and larger chunks for generation. Naturally, we need to make sure that any i+/-n chunk exists before adding null.

My idea is to simplify the implementation by just adding optional prepend/append integers that would default to 0, but could be changed by the user in the settings.

The alternative is to do full Parent Document Retriever, but this is a much bigger task IMHO.

cope avatar May 13 '24 19:05 cope

Parent Document Retriever would be a nice option for documents (like the pinning option). Due to mixes of documents in which some are too big to be retrieved as parent.

RahSwe avatar May 13 '24 22:05 RahSwe