lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Create ConjunctionDISI:patcher

Open ldkjdk opened this issue 2 years ago • 4 comments

may be this a performance bug for multiple segment, when docid is 2147483647 shoud do not to continue for search next doc

Description

Please provide a short description of the changes you're making with this pull request.

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • [ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [ ] I have created a Jira issue and added the issue ID to my pull request title.
  • [ ] I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • [ ] I have developed this patch against the main branch.
  • [ ] I have run ./gradlew check.
  • [ ] I have added tests for my changes.

ldkjdk avatar Mar 04 '22 08:03 ldkjdk

image

ldkjdk avatar Mar 04 '22 08:03 ldkjdk

Why is it a performance bug, do we have scorers that are slow to do an advance(NO_MORE_DOCS)?

jpountz avatar Mar 05 '22 17:03 jpountz

yes, for a search case

BooleanQuery.Builder bQuery = new BooleanQuery.Builder(); TermQuery contents= new TermQuery(new Term("contents", "hello")); bQuery.add(contents, BooleanClause.Occur.MUST); Query idq= IntPoint.newRangeQuery("id", 140, 150); bQuery.add(idq, BooleanClause.Occur.FILTER); Query q = bQuery.build();//MultiFieldQueryParser.parse(key, fields, flags); TopDocs td = searcher.search(q, 10); I think, if key word "hello" have matched a lot of record , perhaps will increase computational cost for "skipper.skipTo(target) + 1"

ldkjdk avatar Jun 17 '22 09:06 ldkjdk

I worry that such a change would be adding little overhead all the time only to help in some rare cases, it's not clear to me that it would be a good trade-off. I'd be interested in more data about performance, e.g. latency before and after the change, number of segments, etc.

jpountz avatar Jun 17 '22 14:06 jpountz

Thanks for the idea @ldkjdk! It looks like we are unsure this is helpful in the general case ... I'll close the PR for now. Please re-open if you feel strongly otherwise?

mikemccand avatar Nov 02 '23 10:11 mikemccand