camel
camel copied to clipboard
[Feature Request] Sliding window for VectorRetriever
Required prerequisites
- [x] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [ ] Consider asking first in a Discussion.
Motivation
Currently, VectorRetriever retrieves results based on exact query matches without considering contextual continuity for long documents. This limitation makes it difficult to retrieve semantically relevant information that spans multiple chunks, leading to incomplete or disjointed retrieval results.
For example, when dealing with long-form documents, a relevant answer may span multiple adjacent chunks, but the current retrieval method does not account for this, causing loss of information. A sliding window approach would improve retrieval accuracy by ensuring overlapping context between chunks.
Solution
Introduce a sliding window mechanism for VectorRetriever, where:
- Each chunk overlaps with the next by a configurable token/window size
- This ensures semantic continuity and better retrieval relevance
- It can be implemented as an optional parameter (window_size) in the VectorRetriever configuration
Alternatives
No response
Additional context
No response