lucene Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices [LUCENE-8542]

I have an index consisting of 44 million documents spread across 60 segments. When I run a query against this index with a huge number of results requested (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch was configured to use an ExecutorService.

(I know this kind of query is fairly unusual and it would be better to use paging and searchAfter, but our architecture does not allow this at the moment.)

The reason for the huge memory requirement is that the search will create a TopScoreDocCollector for each segment, each one with numHits = 5 million. This is fine for the large segments, but many of those segments are fairly small and only contain several thousand documents. This wastes a huge amount of memory for queries with large values of numHits on indices with many segments.

Therefore, I propose to change the CollectorManager - interface in the following way:

change the method newCollector to accept a parameter LeafSlice that can be used to determine the total count of documents in the LeafSlice
Maybe, in order to remain backwards compatible, it would be possible to introduce this as a new method with a default implementation that calls the old method - otherwise, it probably has to wait for Lucene 8?
This can then be used to cap numHits for each TopScoreDocCollector to the leafslice-size.

If this is something that would make sense for you, I can try to provide a patch.

Migrated from LUCENE-8542 by Christoph Kaser, updated Mar 13 2019 Attachments: LUCENE-8542.patch

Oct 24 '18 08:10 asfimport

Christoph Kaser (migrated from JIRA)

I attached a patch - hopefully this shows more clearly what I meant.

Since CollectorManager is marked as experimental, I think it might be possible to port this patch against Lucene 7 as well without providing a default implementation of the new method and keeping the old method.

Nov 29 '18 14:11 asfimport

Adrien Grand (@jpountz) (migrated from JIRA)

This doesn't look like a great solution to your problem? If I were you, I'd probably rather fork TopScoreDocCollector to dynamically grow the heap.

Nov 29 '18 21:11 asfimport

Lucene/Solr QA (migrated from JIRA)

✔ +1 overall

Vote	Subsystem	Runtime	Comment
			Prechecks
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
			master Compile Tests
+1	compile	1m 38s	master passed
			Patch Compile Tests
+1	compile	0m 51s	the patch passed
+1	javac	0m 51s	the patch passed
+1	Release audit (RAT)	0m 30s	the patch passed
+1	Check forbidden APIs	0m 20s	the patch passed
+1	Validate source patterns	0m 20s	the patch passed
			Other Tests
+1	unit	30m 12s	core in the patch passed.
+1	unit	3m 10s	facet in the patch passed.
		40m 30s

Subsystem	Report/Notes
JIRA Issue	LUCENE-8542
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12950019/LUCENE-8542.patch
Optional Tests	compile javac unit ratsources checkforbiddenapis validatesourcepatterns
uname	Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	ant
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
git revision	master / 75b1831
ant	version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018
Default Java	1.8.0_191
Test Results	https://builds.apache.org/job/PreCommit-LUCENE-Build/127/testReport/
modules	C: lucene lucene/core lucene/facet U: lucene
Console output	https://builds.apache.org/job/PreCommit-LUCENE-Build/127/console
Powered by	Apache Yetus 0.7.0 http://yetus.apache.org

This message was automatically generated.

Nov 30 '18 05:11 asfimport

Christoph Kaser (migrated from JIRA)

I think it would be nice to have the option to grow the heap dynamically. However the way TopScoreDocCollector and TopDocsCollector are currently built, for a lucene user that would mean copying the complete source code for those classes and adopting them to use a java.util.PriorityQueue (probably with worse performance than org.apache.lucene.util.PriorityQueue).

This is certainly possible, but would mean a lot of code duplication (from the perspective of a lucene user, because the used priority queue can't be changed easily),

I think that this patch makes sense anyway: The size of segments has a very wide range in a typical index, and usually there are a lot more small segments than large ones. Given that the default implementation of IndexSearcher.slices() returns one slice per segment, that means a lot of wasted memory for all queries that have a numHits greater than the typical size of a small segment. I don't think it has any negative impact on queries with a small value of numHits, because it only adds one Math.min per segment.

It also helps with my problem: for an index with 28 segments and 13,360,068 documents and a search with numhits=5,000,000, it makes the difference between creating priority queues with a combined size of 140,000,000 vs 13,360,068. As you can see in the following table, there are benefits for searches with a more reasonable numHits value as well (all against my index):

numHits	Combined size w/o patch	Combined size with patch
10,000,000	280,000,000	13,360,068
5,000,000	140,000,000	13,360,068
1,000,000	28,000,000	6,870,854
100,000	2,800,000	1,632,997
50,000	1,400,000	1,015,274
10,000	280,000	252,528

Nov 30 '18 08:11 asfimport

Christoph Kaser (migrated from JIRA)

Is there anything I can change / add to get this committed? Or do you think it makes no sense for the general use case of lucene?

Mar 12 '19 12:03 asfimport

Adrien Grand (@jpountz) (migrated from JIRA)

I think this goes against the use-case that Lucene is designed to solve indeed, which is to compute the top-k matches of a query for some reasonable value of k. I'd be ok with eg. having a collector in the sandbox that works better when collecting thousands of hits, but I'd like to avoid making changes to the core only to support this use-case.

Mar 12 '19 12:03 asfimport

Christoph Kaser (migrated from JIRA)

That's too bad, given that this is only a minor change to an experimental API (and does not cause extra work in the reasonable use case). But I understand your reasons.

I may try to build such a collector when I find the time (though I suspect this may involve quite a lot of code duplication if no changes to the core should be made) - for now we simply limit the amount of concurrent queries with huge values of numHits so they fit into the heap.

Mar 12 '19 13:03 asfimport

Michael McCandless (@mikemccand) (migrated from JIRA)

I think the core API change is quite minor and reasonable – letting the Collector.newCollector know which segments (slice) it will collect? E.g. we already pass the LeafReaderContext to Collector.newLeafCollector so it's informed about the details of which segment it's about to collect.

I agree the motivating use case here is somewhat abusive, and a custom Collector is probably needed anyway, but I think this API change could help non-abusive cases too.

Alternatively we could explore fixing our default top hits collectors to not pre-allocate the full topN for every slice ... that is really unexpected behavior, and users have tripped up on this multiple times in the past causing us to make some partial fixes for it.

Mar 12 '19 14:03 asfimport

Adrien Grand (@jpountz) (migrated from JIRA)

I don't think this change really helps as the number of documents in a slice is a pretty bad upper bound of the total number of documents that will be collected. If we want to better support this use-case, I'd rather like that we look into making PriorityQueue growable.

Mar 12 '19 14:03 asfimport

Christoph Kaser (migrated from JIRA)

While it's true the slice size is a bad upper bound, the change does help: As you can see in the table in my comment, it reduces the heap requirement by 90% in my use case, due to the large number of small slices.

Making PriorityQueue growable would certainly be a better solution, however it is much harder to do this without affecting the "sane" use case performance.

Mar 12 '19 14:03 asfimport

Michael McCandless (@mikemccand) (migrated from JIRA)

Maybe we should try swapping in the JDK's PriorityQueue and measure if this really hurts search throughput?

Mar 12 '19 15:03 asfimport

Adrien Grand (@jpountz) (migrated from JIRA)

Right I get how it can help with small slices, but at the same time I'm seeing small slices as something that should be avoided in order to limit context switching so I don't think we should design for small slices?

I think the JDK's PriorityQueue would be fine most of the time, except in the worst case that every new hit is competitive, which happens when more recently indexed documents are more likely to be competitive (eg. when sorting by decreasing timestamp, which I'm seeing often) given that a single reordering via updateTop would need to be replaced with one pop and one push. Can we make our own PriorityQueue growable maybe? I'm not too concerned about performance given that only add() would be affected, not top() and updateTop() which are the ones that are important for performance.

Mar 12 '19 18:03 asfimport

Lucene/Solr QA (migrated from JIRA)

✔ +1 overall

Vote	Subsystem	Runtime	Comment
			Prechecks
+1	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
			master Compile Tests
+1	compile	0m 38s	master passed
			Patch Compile Tests
+1	compile	0m 39s	the patch passed
+1	javac	0m 39s	the patch passed
+1	Release audit (RAT)	0m 32s	the patch passed
+1	Check forbidden APIs	0m 27s	the patch passed
+1	Validate source patterns	0m 27s	the patch passed
			Other Tests
+1	unit	15m 30s	core in the patch passed.
+1	unit	1m 2s	facet in the patch passed.
		19m 49s

Subsystem	Report/Notes
JIRA Issue	LUCENE-8542
JIRA Patch URL	https://issues.apache.org/jira/secure/attachment/12950019/LUCENE-8542.patch
Optional Tests	compile javac unit ratsources checkforbiddenapis validatesourcepatterns
uname	Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	ant
Personality	/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
git revision	master / d8f2a02
ant	version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018
Default Java	1.8.0_191
Test Results	https://builds.apache.org/job/PreCommit-LUCENE-Build/174/testReport/
modules	C: lucene lucene/core lucene/facet U: lucene
Console output	https://builds.apache.org/job/PreCommit-LUCENE-Build/174/console
Powered by	Apache Yetus 0.7.0 http://yetus.apache.org

This message was automatically generated.

Mar 13 '19 06:03 asfimport

Christoph Kaser (migrated from JIRA)

Right I get how it can help with small slices, but at the same time I'm seeing small slices as something that should be avoided in order to limit context switching so I don't think we should design for small slices?

Small slices are the default: The default implementation of IndexSearcher.slices() returns one slice per segment. Since the search runs in an Executor, this may not cause a lot of context switching depending on the thread pool parameters. But you are right, the default implementation of slices() may not be optimal.

Mar 13 '19 07:03 asfimport

Adrien Grand (@jpountz) (migrated from JIRA)

the default implementation of slices() may not be optimal.

Let's change it?

Mar 13 '19 09:03 asfimport

Michael McCandless (@mikemccand) (migrated from JIRA)

+1 to improve slices() to aggregate small slices together by default; that's what we are doing in our production service – we combine up to 5 segments, up to 250K docs in aggregate.

Mar 13 '19 13:03 asfimport

I have once again run across this issue, with numHits == 4 000 000, even though the slices are now aggregated like Michael has suggested. I would still advocate passing the slice size to avoid allocating unnecessarily large buffers per slice when numHits is large. Any chance at all to get this merged?

Oct 21 '24 09:10 ChristophKaser