neural-search icon indicating copy to clipboard operation
neural-search copied to clipboard

[FEATURE] Add rescoreContext for neuralsparse query's two-phase speed up.

Open conggguan opened this issue 1 year ago • 7 comments

Description

This change implement for #646

  • Enhance the speed of neuralsparse query by two-phase.
  • Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

Feature support query

  • [x] NeuralSparseQuery
  • [x] NeuralSparseQuery nested in BoostQuery
  • [x] NeuralSparseQuery nested in BooleanQuery

Feature implementation optimize plan

Intro

Now implement two-phase feature by adding rescoreContext in the begin of HybridQueryPhaseSearcher's. It is harmless but little hack, so there is a long term plan to replace this implementation.

RFC in opensearch core

To add a correct rescoreContext to searchContext, we need both searchContext and whole compound query, but now if we can only get these together in the HybridQueryPhaseSearcher. In QueryBuilder, the searchContext is private, and whole query are not visible. What we need to implement this feature is the whole query and searchContext, so if OpenSearch Core can open a API which can help me get it before the query performed, we can move the rescoreContext function to a more general interface of core.

RoadMap of this optimizing

  1. RFC in OpenSearch Core.
  2. A Incremental update in NeuralSearch plugin.

Issues Resolved

Resolve #646

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

conggguan avatar Apr 17 '24 08:04 conggguan

Intro

Now implement two-phase feature by adding rescoreContext in the begin of HybridQueryPhaseSearcher's. It is harmless but little hack, so there is a long term plan to replace this implementation.

RFC in opensearch core

What we need to implement this feature is the whole query and searchContext, so if OpenSearch Core can open a API which can help me get it before the query performed, we can move the rescoreContext function to a more general interface of core.

RoadMap of this optimizing

  1. RFC in OpenSearch Core.
  2. A Incremental update in NeuralSearch plugin.

Please create a RFC on the opensearch core side and provide details why we cannot open up an interface from core to implement this functionality?

navneet1v avatar Apr 22 '24 04:04 navneet1v

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

navneet1v avatar Apr 22 '24 04:04 navneet1v

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

Yes we merge #693 manually in advance because the query by tokens help us to write IT cases more flexible. Will update this PR after #693 get merged (will be merged soon)

zhichao-aws avatar Apr 22 '24 04:04 zhichao-aws

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

Updated now.

zhichao-aws avatar Apr 22 '24 12:04 zhichao-aws

@vibrantvarun Some BWC test failed because of there is a version check of stream.

// stream in 
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
    this.neuralSparseTwoPhaseParameters = in.readOptionalWriteable(NeuralSparseTwoPhaseParameters::new);
}
// stream out
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
    out.writeOptionalWriteable(this.neuralSparseTwoPhaseParameters);
}

But at now the PR hasn't been backport to V_2_14_0, so after merge will success. Or we can set Version.CURRENT to MinReqVersion and make a further PR to change it to 2.14 again. If a extra pr required or just keep is okay?

conggguan avatar Apr 23 '24 07:04 conggguan

Lets ensure that all BWC tests are successful before this change can be merged.

Some BWC test failed because of there is a version check of stream.

// stream in if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) { this.neuralSparseTwoPhaseParameters = in.readOptionalWriteable(NeuralSparseTwoPhaseParameters::new); } // stream out if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) { out.writeOptionalWriteable(this.neuralSparseTwoPhaseParameters); } But at now the PR hasn't been backport to V_2_14_0, so the 2.14 node can not deserialize the stream correctly. It can only work after backport success. One workaround is using Version.Current, but we need to modify it to 2.14 after backport finished.

zhichao-aws avatar Apr 25 '24 10:04 zhichao-aws

@conggguan please add BWC test for the change. It gives more confidence. Also I see you have significant changes in constructors that too version specific. BWC test is a must for these type of changes.

cc: @navneet1v

vibrantvarun avatar Apr 26 '24 16:04 vibrantvarun