Michael McCandless comments

Results 210 comments of


                                            Michael McCandless

Optimize top-k counting for approximate queries

> > You might want to pre-sort the Hash in unicode order ... it might give a tiny improvement since the TermsEnum can share more internal state on each seekExact....

Optimize top-k counting for approximate queries

> > That sounds promising! Do you take the docFreq (or maybe totalTermFreq) of terms into account? E.g., collecting all term + docFreq from the TermsEnum, then sort them in...

Optimize top-k counting for approximate queries

> I'm afraid the early-stopping method as I described it isn't going to work. Specifically, it's pretty easy to find a case where a single vector matches for multiple consecutive...

Optimize top-k counting for approximate queries

Oh, I was proposing a purely functionality neutral optimization, since indexing co-occurring hashes would result in fewer disjunctive terms at search-time, and should make your searches run faster? But you're...

MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

> I will ask @mikemccand to at least enable --enable-preview on the nightly pure Lucene benchmark by default (and use JDK 19). Ack -- I'll enable this starting from tonite's...

MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

> If we do both at same time, we won't see a difference between old and new Lucene MMAP (on same version). A JDK upgrade may also change other performance...

Count Query construction time into QPS

Awesome, thanks @zhaih!

Support for disabling BKDReader packedIndex off heap

As long as the index is fully hot (is it here?), moving the BKD index off-heap should not cause anything near the performance regression that the o.p. flame charts seem...

Support for disabling BKDReader packedIndex off heap

Which `Directory` implementation is in use here @travisbenedict? If it's a buffered implementation (`SimpleFSDirectory` or `NIOFSDirectory`) can you try switching to `MMapDirectory` instead? The buffered reads are sometimes costly, e.g....

Support for disabling BKDReader packedIndex off heap

Also, if possible, please render flame charts to SVG so they remain interactive after attaching to GitHub issues.