BlackLab
BlackLab copied to clipboard
Use slices instead of leaves for concurrency
We currently use LeafReaders for concurrently gathering hits. They are divided over the available search threads more or less randomly. The problem is that each LeafReader searches a segment, which may vary wildly in size. (see https://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html)
Lucene's IndexSearcher creates slices of segments, each slice containing one or more segments, so that they are roughly equal in size, and uses this for concurrent searches. We should use this mechanism in HitsFromQueryParallel as well, so each thread will take roughly equal time to execute.
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#slices-java.util.List-
(IndexSearcher's slices are not publicly accessible, but we could subclass IndexSearcher and add an accessor method)
Slices are explained here as well: https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html