Andrew Kane
Andrew Kane
@hlinnaka I think the main benefits to batching are: 1. Minimizing out of order elements (elements can only be out of order between batches) 2. Less code duplication 3. Possibly...
Hi @alanwli, 1. Correct. We could filter out tuples that are closer than the last one returned to avoid this, but this could affect recall. 2. The scan will end...
Looking at SIFT 1M, a decent % of tuples can be discarded if strict ordering is implemented (with a filter selectivity of 0.01, usually 2-7%, but sometimes more than 10%).
The additional incorrect semantic is results may not be strictly ordered (which won't happen with GiST). I'd like to minimize these, as it's not intuitive for users, but think the...
This is true from the user's perspective :) It'd be great if we could do something similar, but the recheck code in Postgres requires strict ordering. Edit: Going to try...
Some data on how batch size and selectivity affect out of order results with SIFT 1M (default parameters, limit 20). batch_size | sel 0.1 | sel 0.01 --- | ---...
Some benchmarking results from the [`arxiv` dataset](https://github.com/qdrant/ann-filtering-benchmark-datasets) (2.1M rows, 384 dimensions, first 100 queries, default build parameters, 6GB shared buffers) Branch | Time | Recall --- | --- | ---...
Great, just ran the benchmark above with strict ordering enabled ([branch](https://github.com/pgvector/pgvector/compare/hnsw-streaming...hnsw-streaming-strict)). It drops the recall from 95.8 to 91.8. Edit: More data on strict ordering and ef_search (batch size) ef_search...
One idea is to allow users to select the mode: ```tsql SET hnsw.iterative_search = off; -- default for 0.8.0 SET hnsw.iterative_search = on; -- strict, default for future version SET...
Hi @Dylan-DPC, thanks for the PR. From what I can tell (testing w/ `rust_decimal`, which incorporated this in `1.37.0`): - If an earlier version of Diesel is installed, this will...