Alex Klibisz comments

Results 158 comments of


                                            Alex Klibisz

Optimize top-k counting for approximate queries

Yeah - really we should be able to know precisely how many times each vector matched. It's already an approximate algorithm, so introducing more approximation for performance-sake might not be...

Optimize top-k counting for approximate queries

Since one of the major bottlenecks is reading the matched doc ids from the segment, the last thing I'm tempted to look into is whether there is some construct for...

Optimize top-k counting for approximate queries

Hi @aecio . Thanks for the link. I wasn't aware of those algorithms so I'll read the slides, hopefully tonight. I looked through the WANDScorer code. I didn't really see...

Optimize top-k counting for approximate queries

@aecio The algorithms in those slides seem worth trying. It seems that for both Fagin's Algorithm and the Threshold Algorithm, I'd need to keep an array of PostingsEnums to "step"...

Optimize top-k counting for approximate queries

Took a first pass at the Threshold Algorithm this morning: https://github.com/alexklibisz/elastiknn/blob/top-k-algos/elastiknn-lucene/src/main/java/org/apache/lucene/search/MatchHashesAndScoreQuery.java#L51-L134 A few of the tests pass.. definitely not all. There are probably some mistakes in there.

Optimize top-k counting for approximate queries

I'm not seeing any practical improvement from using the Threshold Algorithm. The kth greatest score is consistently ~2 OOM lower than the threshold. This makes some intuitive sense when you...

Optimize top-k counting for approximate queries

I've been going through the various optimizations described in this fantastic paper: http://www.dcs.gla.ac.uk/~craigm/publications/fnt-efficient-query-processing.pdf So far I've tried the TAAT Quit and Continue described on page 39, TAAT MaxScore on page...