Robert Muir comments

Results 269 comments of


                                            Robert Muir

LUCENE-10471 Increse max dims for vectors to 2048

i dont agree, I think the problems are flaws with the HNSW and can't be worked around. Its too slow already at 768 and in fact the current limit overpromises...

Vector accelerated GroupVInt decoder for MemorySegmentIndexInput

Hi, a couple suggestions: 1. Somehow, we need to avoid Vector API code inside the MemorySegment code. Just because MemorySegment is available, does not mean Vector API is usable, one...

Vector accelerated GroupVInt decoder for MemorySegmentIndexInput

> I'm surprised by how slow this is with AVX off given that this can be implemented with SSE2 :(. Yes, it is surprising: we found the same situation with...

LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

we should also be careful about introducing complex CharFilters, I consider the current CharFilter api broken after debugging #11976 see https://github.com/apache/lucene/issues/11976#issuecomment-1328150137

Add FilterDirectory to track write amplification factor

Closing as the PR has been merged and is in the 9.5.0 section of CHANGES.txt

Add chart showing total merge time by part of index

yes this would be nice when discussing issues such as https://github.com/apache/lucene/issues/12203 otherwise, I think merges are currently too opaque when discussing index performance: but we "know" certain parts are way...

What aKNN dimensionality should we use in nightly benchmark?

I think it is enough to just use a bigger vector size that better represents the performance issues? Maybe it looks like the current graph for users only using 100...

What aKNN dimensionality should we use in nightly benchmark?

thanks a lot for posting these indexer benchmarks @msokolov

Can we re-enable strict checking for KNN queries?

everything in search is an approximation: BM25, etc etc. There's absolutely no reason to give KNN some kind of free pass to leniency park. Leniency isn't going to help anything...

Can we re-enable strict checking for KNN queries?

we could track recall/precision/MAP for BM25 scoring too, but we don't. we are strict and it gives some confidence that scoring is working correctly: hasn't changed unless we intended it...