Michael McCandless comments

Results 335 comments of


                                            Michael McCandless

Try using Murmurhash 3 for bloom filters

> > But understanding why supposedly equivalent expressions yield such a different benchmark result remains ... > > The expression `((int) A) >>> 1 + ((int) B) >>> 1` is...

Try using Murmurhash 3 for bloom filters

Thanks for the reminder @shubhamvishu! > I'm seeing some crazy speedups for some tasks in the benchmarks (including `PKLookup`; a few got little slower) when using the new expression. Hmm...

Try using Murmurhash 3 for bloom filters

> > I'm seeing some crazy speedups for some tasks in the benchmarks (including `PKLookup`; a few got little slower) when using the new expression. > > Hmm did you...

Add Query for reranking KnnFloatVectorQuery with full-precision vectors

I think this is a nice overall approach, adding a new `RerankKnnFloatVectorQuery` that wraps a KNN query that used quantization to get the initial results. It's reminiscent of Lucene's existing...

add a few numeric range faceting tasks to wikimedium*

Thanks @gsmiller! Maybe we should also separately add to the nightly tasks so nightly benchy can catch regressions? We can do that separately (and I agree we should, also separately,...

Support multiple HNSW graphs backed by the same vectors

> * This proposal has an added side-benefit of de-duplicating vectors _within_ a field as well (if the features used for vector generation are identical across two documents) This is...

Support multiple HNSW graphs backed by the same vectors

These are awesome results @kaivalnp! And this was only 200K docs -- with larger indices would the gains be more or less? Also, it's quite disturbing that even at a...

Multireader Support in Searcher Manager

Thanks @Shibi-bala -- I agree it's odd it was scoped to just `DirectoryReader` -- any `IndexReader` should work as long as it can `openIfChanged` on itself. I think `English.java` (from...

asynchronous I/O + saturating NVMe bandwidth

Maybe start with a more bite-sized usage of async IO in Lucene? E.g. there have been discussions about approximate KNN algorithms that use a fast, highly quantized first pass, relying...

asynchronous I/O + saturating NVMe bandwidth

Oh yeah the async IO for KNN rescoring was first (that I saw!) mentioned in [this issue](https://github.com/apache/lucene/issues/12615).