Adrien Grand comments

Results 310 comments of


                                            Adrien Grand

Increase topN to 1,000.

I'll merge once https://github.com/apache/lucene/issues/14630 is resolved.

Increase topN to 1,000.

@mikemccand I'm curious if you're allowed to share how many candidate hits are fetched from Lucene before being fed to rescorers on amazon.com?

Encode dense blocks of postings as bit sets

> How do you differentiate different encodings in Lucene? Is it stored as extra metadata? It extends the int8 flag, which previously recorded the number of bits per value. Positive...

Explore bypassing HNSW graph building for tiny segments

> Also, would N take into account if Panama Vector is enabled and if things are quantized or not? FWIW I'd optimize for simplicity over picking the perfect heuristic. As...

Explore bypassing HNSW graph building for tiny segments

> Maybe we could do something similar here and only build a HNSW graph if doing a top-1000 search would visit less than 1/8th of the documents that have a...

Explore bypassing HNSW graph building for tiny segments

> I would suspect it is somewhere around 10k. Interestingly, it looks like your intuition roughly aligns with my suggestion if using topK=100. `expectedVisitedNodes(100, 10_000) = 921 ~= 1250 =...

Explore bypassing HNSW graph building for tiny segments

> allow it to be configurable?? maybe that is too many knobs I worry about too many knobs too, I'd hardcode it. > I would expect this to improve indexing...

Explore bypassing HNSW graph building for tiny segments

We have [`StoredFieldsBenchmark`](https://github.com/mikemccand/luceneutil/blob/b3d5216dfd82bb28dcff699f9d904b8b03d8d116/src/extra/perf/StoredFieldsBenchmark.java) to test the impact of NRT (frequent small flushes, frequent small merges) on stored fields, we could write something similar for vectors.

Promote sandbox facets to the main facets module

Facets already put the burden of choosing between taxonomy and doc-value-based faceting on users. If we introduce a new approach for faceting, I worry that it would make things even...

Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces

Can you clarify which allocation is the problematic one, and where it's done on the indexing path?