lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Optimize prefix sum computation in Lucene99HnswVectorsReader; fixes #15024

Open yossev opened this issue 4 months ago • 18 comments

Replaced the two-step prefix sum loop in Lucene99HnswVectorsReader with a single-loop variant that avoids redundant memory access and improves performance.

Previous approach:

  • Read first value separately.
  • Then used previous buffer element + readVInt().

New approach:

  • Accumulates sum in a single pass and assigns directly.

This change follows the suggestion from issue #14979 and has the same functional behavior with slightly better efficiency.

yossev avatar Aug 01 '25 18:08 yossev

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Aug 01 '25 19:08 github-actions[bot]

Thank you, this looks good. If you have cycles to run benchmarks, this would be appreciated, you can check out this: https://github.com/mikemccand/luceneutil/blob/main/README.md#running-the-knn-benchmark.

jpountz avatar Aug 01 '25 20:08 jpountz

Thank you, this looks good. If you have cycles to run benchmarks, this would be appreciated, you can check out this: https://github.com/mikemccand/luceneutil/blob/main/README.md#running-the-knn-benchmark.

Thank you, Glad it looks good. Before proceeding, I just wanted to check, is there anything you'd like me to adjust or improve before it's ready to be merged?

Also, I’ll take a look at the benchmark tool you linked and try to run some comparisons if time allows. Appreciate the reference!

yossev avatar Aug 01 '25 21:08 yossev

The only missing thing is an entry in lucene/CHANGES.txt but we can deal with it later.

jpountz avatar Aug 01 '25 21:08 jpountz

Perfect. I will be waiting for updates on that matter. Thank you!

yossev avatar Aug 01 '25 22:08 yossev

@yossev FYI you'll need to resolve conflicts since another change touched the same lines of code. I'm curious if you had any success with benchmarking?

jpountz avatar Aug 08 '25 05:08 jpountz

@yossev FYI you'll need to resolve conflicts since another change touched the same lines of code. I'm curious if you had any success with benchmarking?

Thanks for the heads-up! I noticed that the recent changes in Lucene99HnswVectorsReader.java reverted part of the optimization I made according to #14979 I’ll resolve the merge conflicts by keeping that fix (unless there’s a reason it should be reverted), and will adjust it if needed to fit the new changes.

Regarding benchmarking ,I’ll run it after the conflicts are resolved so results reflect the latest code.

yossev avatar Aug 09 '25 14:08 yossev

I was checking to see what had been reverted, but I don't see it? #14979 didn't change Lucene99HnswVectorReader.java, right? Maybe you're referring to the conflicts since #14932 moved the VInt-decoding block into a conditional?

msokolov avatar Aug 09 '25 17:08 msokolov

I was checking to see what had been reverted, but I don't see it? #14979 didn't change Lucene99HnswVectorReader.java, right? Maybe you're referring to the conflicts since #14932 moved the VInt-decoding block into a conditional?

I was just giving credit to #14979 to the optimization i made in this case, not saying it changed anything.

yossev avatar Aug 09 '25 18:08 yossev

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

github-actions[bot] avatar Aug 13 '25 13:08 github-actions[bot]

Hello @jpountz,

Please check the latest commit, i resolved the conflicts locally, and added back the optimization that was mentioned #14979.

i will be waiting for further feedback.

yossev avatar Aug 13 '25 13:08 yossev

@yossev see the output of the build failure, the build is unhappy with formatting, you need to run ./gradlew tidy. Have you had any success with benchmarking?

jpountz avatar Aug 13 '25 14:08 jpountz

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Aug 28 '25 00:08 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Oct 16 '25 00:10 github-actions[bot]

I would be happy to merge and backport this once merge conflicts and formatting is tidied @yossev

benwtrent avatar Oct 16 '25 16:10 benwtrent

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Oct 31 '25 00:10 github-actions[bot]

Hello @yossev friendly ping :). Once conflicts & formatting are fixed, I can merge and backport.

We should also have a CHANGES entry under optimizations for 10.4.0

benwtrent avatar Oct 31 '25 11:10 benwtrent

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Nov 16 '25 00:11 github-actions[bot]