BlackLab
BlackLab copied to clipboard
Improve multithreaded hit fetching performance by making sure there's always a buffer of hits available
We have Hits.FETCH_HITS_MIN
, which is sometimes added to the requested number of hits to make sure we don't fetch a single hit every time while we're iterating through a list of hits, which is inefficient because of locking. Instead we try to fetch a small batch of hits.
In single-threaded mode this is fine, but it may not be optimal when hits are being produced and consumed in different threads. The risk is that a thread consuming hits will trigger the production (fetching) of 20 more hits (while the consumer waits for all of them, even though it just needs one at this time). Then the producer(s) will stop while the consumer processes these 20 hits. Then the producer(s) will fetch another 20 hits while the consumer waits for them, etc.
In this case it would be better if ensureResultsRead()
returned as soon as the requested result was available, but fetching hits continued in the background until a small reserve was built up. Compare it to streaming video, where you always want to make sure you load a few seconds ahead to avoid stutter.
Alternatively, maybe we should iterate per batch in more places, i.e. don't walk through the hits one by one but request a batch, iterate through those, etc.