BlackLab icon indicating copy to clipboard operation
BlackLab copied to clipboard

Use slices instead of leaves for concurrency

Open jan-niestadt opened this issue 3 years ago • 1 comments

We currently use LeafReaders for concurrently gathering hits. They are divided over the available search threads more or less randomly. The problem is that each LeafReader searches a segment, which may vary wildly in size. (see https://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html)

Lucene's IndexSearcher creates slices of segments, each slice containing one or more segments, so that they are roughly equal in size, and uses this for concurrent searches. We should use this mechanism in HitsFromQueryParallel as well, so each thread will take roughly equal time to execute.

https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#slices-java.util.List-

(IndexSearcher's slices are not publicly accessible, but we could subclass IndexSearcher and add an accessor method)

jan-niestadt avatar Nov 24 '21 10:11 jan-niestadt

Slices are explained here as well: https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html

jan-niestadt avatar Nov 24 '21 10:11 jan-niestadt