Michael McCandless

Results 210 comments of Michael McCandless

> > You might want to pre-sort the Hash in unicode order ... it might give a tiny improvement since the TermsEnum can share more internal state on each seekExact....

> > That sounds promising! Do you take the docFreq (or maybe totalTermFreq) of terms into account? E.g., collecting all term + docFreq from the TermsEnum, then sort them in...

> I'm afraid the early-stopping method as I described it isn't going to work. Specifically, it's pretty easy to find a case where a single vector matches for multiple consecutive...

Oh, I was proposing a purely functionality neutral optimization, since indexing co-occurring hashes would result in fewer disjunctive terms at search-time, and should make your searches run faster? But you're...

> I will ask @mikemccand to at least enable --enable-preview on the nightly pure Lucene benchmark by default (and use JDK 19). Ack -- I'll enable this starting from tonite's...

> If we do both at same time, we won't see a difference between old and new Lucene MMAP (on same version). A JDK upgrade may also change other performance...

As long as the index is fully hot (is it here?), moving the BKD index off-heap should not cause anything near the performance regression that the o.p. flame charts seem...

Which `Directory` implementation is in use here @travisbenedict? If it's a buffered implementation (`SimpleFSDirectory` or `NIOFSDirectory`) can you try switching to `MMapDirectory` instead? The buffered reads are sometimes costly, e.g....

Also, if possible, please render flame charts to SVG so they remain interactive after attaching to GitHub issues.