Michael McCandless comments

Results 210 comments of


                                            Michael McCandless

Generalize LSBRadixSorter and use it in SortingPostingsEnum

> I also run the index script to see flush time with this new approach, result in ~15% faster for random data and no regression on asc/desc :) Hmm it...

Use Kahan summation for float aggregations to reduce errors

Neat -- I had never heard of Kahan summation. Here is its [Wikipedia page](https://en.wikipedia.org/wiki/Kahan_summation_algorithm).

[WIP] LUCENE-10002: Deprecate FacetsCollector#search helper methods as they internally use IndexSearcher#search(Query, Collector) API

Yeah I agree it's OK to deprecate without replacement, but maybe in the deprecated javadocs (and in `MIGRATE.txt` for 10.0) add a short explanation about using `MultiCollectorManager` for this?

Use `instanceof` pattern-matching where possible

In general it's great for Lucene devs to use the new language features we gain by setting a minimum Java version. This is (part of?) why we have such minimums!...

Add new token filters for Japanese sutegana (捨て仮名)

Looks great @daixque -- would you like to add a `lucene/CHANGES.txt` entry dscribing this awesome new capability? Be sure to put it under the `9.10.0` section since we can backport...

Random access term dictionary

I'll try to review this soon -- it sounds compelling @Tony-X! I like how it is inspired by Tantivy's term dictionary format (which holds all terms + their metadata in...

Random access term dictionary

> Thanks for the tips! Yes, almost there. I'm working on the real compact bitpacker and unpacker. I still need to implement the PostingFormat afterwards. Do you think I need...

Random access term dictionary

> This is reasonable as the terms index (FST) holds all the terms. +1, nice! > #### Fuzzy/Wildcard/Prefix queries got _much slower_ > This is also expected because currently I...

Theres are very interesting results @Tony-X! I'll try to give deeper response soon, but one idea that jumped out about `Wildcard` is that BlockTree somewhere takes advantage of `commonSuffixBytes` or...

Random access term dictionary

The `PKLookup` gains are astounding! Especially interesting is the off -> on heap gains for that task. We are somehow paying a high price for going through Lucene's IO APIs...