Michael McCandless
Michael McCandless
@jpountz noticed that [some of the taxo facets nightly tasks jumped surprisingly](https://github.com/apache/lucene/pull/12408#issuecomment-1663321463) when we added the new `count(*)` tasks. Digging, I realized that the nightly benchmarks randomness had shifted when...
Our `FuzzyQuery` benchy is too synthetic today, automatically derived from `enwiki` terms. It'd be better to start from a public domain database of many names, and randomly pick names and...
(Spinoff from discussions with @rmuir and in trying to test https://github.com/apache/lucene/pull/12311 in luceneutil). Currently the nightly benchmark tests 100 dimensions, but this seems not common/realistic since 1) it is not...
We are more and more taking advantage of concurrent hardware to reduce the latency of a single query over time with Lucene. We should add a benchmark task/charts that show...
Lucene's infoStream now logs how long each part of the index (postings, doc values, points, vectors, etc.) take to merge. It should be simple to parse these logs from nightly...
A while back we discovered that KNN was producing non-deterministic results even on a deterministic index, and [disabled strict top N hit checking for `KNNVectorQuery`](https://github.com/mikemccand/luceneutil/commit/6fee2290). We think/thought this was because...
One of the awesome suggestions that came out of the ApacheCon NA 2022 talk ("Learning from 11+ years of Apache Lucene benchmarks") was to export at least the values for...
### Description I came across this compelling sounding [JVector project](https://foojay.io/today/jvector-1-0/) which looks to have awesome QPS performance. It uses [DiskANN](https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/) instead of HNSW (what Lucene uses now). Maybe we should...
### Description [Spinoff from #13004] Recently we added off-heap FST reading, but only switched to it in limited cases, starting with the terms index in `BlockTree` terms dictionary. Should we...
### Description [I'm not sure how general this is but figured I'd open this to see if there is interest / other use cases:] At Amazon product search team we...