lucene
lucene copied to clipboard
Seeding HNSW Search
Description
In some vector search cases, users may already know some documents that are likely related to a query. Let's support seeding HNSW's scoring stage with these documents, rather than using HNSW's hierarchical stage.
An example use case is hybrid search, where both a traditional and vector search are performed. The top results from the traditional search are likely reasonable seeds for the vector search. Even when not performing hybrid search, traditional matching can often be faster than traversing the hierarchy, which can be used to speed up the vector search process (up to 2x faster for the same effectiveness), as was demonstrated in this article (full disclosure: I'm an author of the article).
This enhancement proposes adding a seed
query, alongside the existing filter
query, to the KNN query classes. The results of this query will be fed into HnswGraphSearcher
, and ultimately replace the graph entry points here. If the seed query fails (e.g., keywords do not match any documents), the approach will fall back onto the existing hierarchical search process.
Pull request to follow.