lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Utilize `docIdRunEnd` on `ReqExclBulkScorer`

Open HUSTERGS opened this issue 6 months ago • 3 comments

Description

This PR propose to utilize docIdRunEnd on ReqExclBulkScorer, so we can jump faster on MUST_NOT clause

HUSTERGS avatar Jun 18 '25 07:06 HUSTERGS

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Jun 18 '25 07:06 github-actions[bot]

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Jun 18 '25 09:06 github-actions[bot]

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Jun 18 '25 09:06 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Jul 18 '25 00:07 github-actions[bot]

Unfortunately this commit broke some of the tests. Here is a repro:

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestLRUQueryCache.testRandom" -Ptests.file.encoding=US-ASCII -Ptests.gui=true -Ptests.haltonfailure=false -Ptests.jvmargs= -Ptests.jvms=6 -Ptests.multiplier=3 -Ptests.nightly=true -Ptests.seed=7600B69A193F746E -Ptests.vectorsize=256

the builds mailing list is showing more failures that are triggered from the asserting bulk scorer.

dweiss avatar Jul 21 '25 05:07 dweiss

I will allow myself to revert this on main so that the builds pass, hope it's ok.

dweiss avatar Jul 21 '25 05:07 dweiss

Unfortunately this commit broke some of the tests. Here is a repro:

Sorry about that, it seems the docIDRunEnd() can sometimes exceed the max, thus causes AssertingBulkScorer failed at line 83, then causes the thread leak: https://github.com/apache/lucene/blob/973183258d1e11fee88a84b2a40b5fe02f0c477d/lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingBulkScorer.java#L82-L84

I changed

- upTo = exclApproximation.docIDRunEnd();
+ upTo = Math.min(max, exclApproximation.docIDRunEnd());

And the test can pass properly locally

HUSTERGS avatar Jul 21 '25 07:07 HUSTERGS

Could you file another PR that brings this back? We'd probably need to beast this more - you can do this locally too, see: https://github.com/apache/lucene/blob/main/help/tests.txt#L93-L127

dweiss avatar Jul 21 '25 09:07 dweiss

Could you file another PR that brings this back? We'd probably need to beast this more - you can do this locally too, see: https://github.com/apache/lucene/blob/main/help/tests.txt#L93-L127

Thanks for your guidance ! Will run the test locally to verify the change before I raise a new PR.

HUSTERGS avatar Jul 21 '25 13:07 HUSTERGS