robertvanwinkle1138
robertvanwinkle1138
@benwtrent For merges there is "FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search" https://arxiv.org/pdf/2105.09613.pdf DiskANN is known to be slower at indexing than HNSW and the...
The SPANN paper does not address efficient filtered queries. For example, Lucene's HNSW calculates the similarity score for every record, regardless of the record matching the filter. Filtered − DiskANN...
> QDrant's HNSW filter solution is the exact same as Lucene's Interesting thanks. > as candidate posting lists are gathered, ensure they have some candidates Couldn't that be done with...
Perhaps much of the jvector performance improvement is simply from on heap caching. https://github.com/jbellis/jvector/blob/main/jvector-base/src/main/java/io/github/jbellis/jvector/disk/GraphCache.java
Another notable difference in the Lucene implementation is delta variable byte encoding of node ids. The increase in disk space requires the user to purchase more RAM per server. Also...