lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Multireader Support in Searcher Manager

Open Shibi-bala opened this issue 1 year ago • 9 comments

Description

Copied from https://github.com/apache/lucene/issues/13975

I'd like to use MultiReader inside my searcher manager, but currently there is only support for DirectoryReader. Not sure about the context for. why this was the case initially, seems specific to some desired use-case https://issues.apache.org/jira/browse/LUCENE-6087.

Anyway should be a 1-line change here: https://github.com/apache/lucene/blob/9e90fb5bf1807f1d9ccf2cabf74b3235f2431ed4/lucene/core/src/java/org/apache/lucene/search/SearcherManager.java#L125

I found a relevant stack overflow question here: https://stackoverflow.com/questions/49817453/searchermanager-and-multireader-in-lucene

Shibi-bala avatar Nov 04 '24 16:11 Shibi-bala

Thanks @Shibi-bala -- I agree it's odd it was scoped to just DirectoryReader -- any IndexReader should work as long as it can openIfChanged on itself.

I think English.java (from Lucene's test-framework) was maybe deleted long ago? Maybe simplify the test to not bother with English words... just Integer.toString(i) or "" + i should be fine?

Also, please revert the wildcard import (import org.apache.lucene.index.*) -- I think our style checker (jtidy/spotless) will be unhappy with that.

mikemccand avatar Nov 04 '24 19:11 mikemccand

any IndexReader should work as long as it can openIfChanged on itself.

Does MultiReader implement openIfChanged() ? I see a check in SearcherManager#refreshIfNeeded() that asserts for the reader to be a DirectoryReader instance. This is used by the ReferenceManager base class whenever maybeRefresh() is called.

vigyasharma avatar Nov 08 '24 07:11 vigyasharma

To add to @vigyasharma, I have been wondering if we should remove SearcherManager and encourage users to use IndexReaderManager. IndexSearcher is cheap to create and there are som reasons why it may be interesting to create a different IndexSearcher per query already, e.g. to tune parallelism based on current load, or to configure a timeout on the IndexSearcher.

Maybe MultiReader falls in the same bucket. Presumably, the MultiReader is made from multiple DirectoryReaders, so maybe the application code should not create a SearcherManager that works with a MultiReader but instead manage two (or more, one per Directory) ReaderManagers and dynamically create a MultiReader and then an IndexSearcher on top of this MultiReader on every search?

jpountz avatar Nov 13 '24 08:11 jpountz

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Nov 28 '24 00:11 github-actions[bot]

I have been wondering if we should remove SearcherManager and encourage users to use IndexReaderManager

We could start encouraging people to use the existing ReaderManager. Both SearcherManager and SearcherTaxonomyManager create searchers on the new readers anyway, so it have the same overhead if any. But I suspect a lot of users consume the SearcherManagers and removing it would be a breaking change?

re: MultiReaders, I think Lucene could provide a convenient way to refresh on the multiple sub-readers. Like a reference managedMultiReaderManager that refreshes all sub-readers whenever refreshIfNeeded is invoked.

vigyasharma avatar Apr 12 '25 20:04 vigyasharma

I hacked together a prototype impl. for a reference managed MultiReader. I can add some tests and clean it up if this meets our requirements. – https://github.com/apache/lucene/pull/14486

vigyasharma avatar Apr 14 '25 20:04 vigyasharma

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Apr 29 '25 00:04 github-actions[bot]

Bumping up since the PR went stale. Should I finalize the changes in #14486 ?

vigyasharma avatar May 11 '25 00:05 vigyasharma

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar May 25 '25 00:05 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Oct 16 '25 00:10 github-actions[bot]