Jonathan Ellis
Jonathan Ellis
Just put everything (or everything except the imports, either way) in an "if __name__ == '__main__'" block. https://stackoverflow.com/questions/419163/what-does-if-name-main-do this prevents multiprocessing from entering an infinite loop when it imports the...
Responding top to bottom, > I wonder how much the speed difference is due to (1) Vectors being out of memory (and if they used PQ for diskann, if they...
> DiskANN is known to be slower at indexing than HNSW I don't remember the numbers here, maybe 10% slower? It wasn't material enough to make me worry about it....
> It is possible that the candidate postings (gathered via HNSW) don't contain ANY filtered docs. This would require gathering more candidate postings. This was a big problem for our...
> Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that wraps jvector? (https://github.com/jbellis/jvector) I'm happy to support anyone who wants to try this, including modifying JVector to make...
> recall actually improves when introducing pq, and only starts to decrease at a factor of 16 I would guess that either there is a bug or you happen to...
Apache is going to use version da eventually. There are ways to tell Apache's files apart, but to avoid confusion can we make this db instead?
Can you include a sample trace?
(A query that includes two SAI index predicates + ANN would be ideal)
Thanks, this is a nice improvement. Comments: ``` Index mean cardinalities are table_13_val_idx:-9223372036854775808 ``` - Is this b/c we're applying code to vectors that shouldn't be, or is this just...