[Feature Request] Sliding window for VectorRetriever

Open AveryYay opened this issue 10 months ago • 0 comments

Required prerequisites

[x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[ ] Consider asking first in a Discussion.

Motivation

Currently, VectorRetriever retrieves results based on exact query matches without considering contextual continuity for long documents. This limitation makes it difficult to retrieve semantically relevant information that spans multiple chunks, leading to incomplete or disjointed retrieval results.

For example, when dealing with long-form documents, a relevant answer may span multiple adjacent chunks, but the current retrieval method does not account for this, causing loss of information. A sliding window approach would improve retrieval accuracy by ensuring overlapping context between chunks.

Solution

Introduce a sliding window mechanism for VectorRetriever, where:

Each chunk overlaps with the next by a configurable token/window size
This ensures semantic continuity and better retrieval relevance
It can be implemented as an optional parameter (window_size) in the VectorRetriever configuration

Alternatives

No response

Additional context

No response

Feb 28 '25 23:02 AveryYay