OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Introduce ApproximateRangeQuery and ApproximateableQuery

Open harshavamsi opened this issue 1 year ago • 92 comments

Description

Most of the logic is as per https://github.com/opensearch-project/OpenSearch/issues/13566. I've introduced a new ApproximateableQuery that is virtually similar to what IndexOrDocValues does today. It returns either an originalQuery or an approximateQuery. During search time we evaluate if a query matches a particular requirement for it to be rewritten from originalQuery to approximateQuery. Here I started off with just converting the DateRangeQuery to use the approximation. If we have a top level range query on a date field, we will approximate the results by only scoring 10K or size.

Related Issues

Resolves #11251 #9541 #13566

Check List

  • [ ] New functionality includes testing.
    • [ ] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] API changes companion pull request created.
  • [ ] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • [ ] Commits are signed per the DCO using --signoff
  • [ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

harshavamsi avatar May 22 '24 22:05 harshavamsi

:x: Gradle check result for 95236d69cbe228fbb3a90b7f3dee2ec2a9a05475: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 22:05 github-actions[bot]

:x: Gradle check result for c98b56ca0df7d78532108348ef43898c443b99c8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 22:05 github-actions[bot]

:x: Gradle check result for 76b4abe4da669c111a738dcfa63e1dd2896de85f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 22:05 github-actions[bot]

:x: Gradle check result for 76b4abe4da669c111a738dcfa63e1dd2896de85f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 22 '24 22:05 github-actions[bot]

:x: Gradle check result for 2cf5e27d350ab5e66add404f598fa95846fe2eaf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 23 '24 17:05 github-actions[bot]

:x: Gradle check result for 090ddc6d856be20d8e1eb40fb850a1449047fa9a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 23 '24 18:05 github-actions[bot]

:x: Gradle check result for 9ac309a7128760c4b0ff16794d4876541bdfe2ce: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 23 '24 18:05 github-actions[bot]

:x: Gradle check result for e92bd90a9ab5fc69a41f8bbbd43bbc27a936b2f6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 06 '24 20:06 github-actions[bot]

:x: Gradle check result for 5728c31a7476db40acba79abb6d24bb134732475: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 10 '24 22:06 github-actions[bot]

:x: Gradle check result for cdd1a0f0450ddcfd267f8c80c9a4900ee48e4f48: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 00:06 github-actions[bot]

:x: Gradle check result for 6b2acaf1c6b825c5e60195beca0d7157f091e3e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 00:06 github-actions[bot]

:x: Gradle check result for 6a28cc10efa403cf7d3bb6f3ac401740a06524c2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 01:06 github-actions[bot]

:x: Gradle check result for 07f7d05c901d7f09a75a38cf259e24b5f358063e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 17:06 github-actions[bot]

:white_check_mark: Gradle check result for 163e8a138d6fdcfaecccb207569753a8582b2894: SUCCESS

github-actions[bot] avatar Jun 11 '24 18:06 github-actions[bot]

Codecov Report

Attention: Patch coverage is 31.92308% with 177 lines in your changes missing coverage. Please review.

Project coverage is 71.82%. Comparing base (579f2aa) to head (7938e63). Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...search/approximate/ApproximatePointRangeQuery.java 34.67% 115 Missing and 15 partials :warning:
...arch/search/approximate/ApproximateScoreQuery.java 18.51% 21 Missing and 1 partial :warning:
.../approximate/ApproximateIndexOrDocValuesQuery.java 23.52% 13 Missing :warning:
...a/org/opensearch/index/mapper/DateFieldMapper.java 18.18% 6 Missing and 3 partials :warning:
...ensearch/search/internal/ContextIndexSearcher.java 0.00% 2 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13788      +/-   ##
============================================
- Coverage     71.96%   71.82%   -0.14%     
+ Complexity    63654    63638      -16     
============================================
  Files          5247     5251       +4     
  Lines        297352   297652     +300     
  Branches      42981    43044      +63     
============================================
- Hits         213982   213791     -191     
- Misses        65774    66304     +530     
+ Partials      17596    17557      -39     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 11 '24 18:06 codecov[bot]

:grey_exclamation: Gradle check result for 0be3dfaad3ad3fc703a44975bbf4625a9e28a279: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jun 11 '24 18:06 github-actions[bot]

@jainankitk @msfroh @reta would love some feedback

harshavamsi avatar Jun 11 '24 18:06 harshavamsi

:x: Gradle check result for ab518a30590ddc3e90cab6948980744ca6af3284: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jun 11 '24 18:06 github-actions[bot]

:white_check_mark: Gradle check result for 2eedc641ccd83f9e27362768c60956e41922cca5: SUCCESS

github-actions[bot] avatar Jun 11 '24 19:06 github-actions[bot]

@jainankitk @msfroh @reta would love some feedback

Apologies @harshavamsi , will take a look once the release is cleared out (later this week), thank you

reta avatar Jun 11 '24 21:06 reta

@harshavamsi Could you help me understand the definition and what all falls under ApproximateableQuery?

Approximation given me a perception that we are trying to approximate something and usually approximation is on a number. So is the intent behind ApproximateableQuery is to approximate search results i.e. search results are not precise but approximate?

Going through the implementation of ApproximatePointRangeQuery its trying to early terminate the query based on the size, which I believe is not approximate but accurate. So my concern here is around nomenclature and confusion it might cause to developers.

rishabhmaurya avatar Jul 03 '24 18:07 rishabhmaurya

:x: Gradle check result for eebae848d4e1027c786646c647368036e41fa073: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 16:07 github-actions[bot]

:x: Gradle check result for 4b6660480cf8e77ea4802338b517f79f337f0547: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 16:07 github-actions[bot]

:x: Gradle check result for a6038114f37c6338cb9d2feb3717ee6c8e9dfd71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 16:07 github-actions[bot]

:x: Gradle check result for be256614ab3992e59b489ddf234572e713730197: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 17:07 github-actions[bot]

:x: Gradle check result for 2eedc641ccd83f9e27362768c60956e41922cca5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 17:07 github-actions[bot]

:white_check_mark: Gradle check result for c625f3cbcc453e49b7a511ec9e993dc7f829df6d: SUCCESS

github-actions[bot] avatar Jul 12 '24 17:07 github-actions[bot]

:x: Gradle check result for 9540519322b457d08c73141a253b976943fda502: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jul 12 '24 17:07 github-actions[bot]

@harshavamsi Could you help me understand the definition and what all falls under ApproximateableQuery?

Approximation given me a perception that we are trying to approximate something and usually approximation is on a number. So is the intent behind ApproximateableQuery is to approximate search results i.e. search results are not precise but approximate?

Going through the implementation of ApproximatePointRangeQuery its trying to early terminate the query based on the size, which I believe is not approximate but accurate. So my concern here is around nomenclature and confusion it might cause to developers.

@rishabhmaurya thanks for your comments, in the present state of approximateable queries we intend to give users approximate results instead of accurate results. Take for example, in this case we return an arbitrary set of 10,000 hits by scoring only 10,000 documents instead of all documents in the segment. In my opinion this is some form for approximation.

I do agree that we do not fully leverage the use of the term and are returning accurate 10k hits, but I introduced the framework here only for the purposes of adding more use cases later.

harshavamsi avatar Jul 16 '24 19:07 harshavamsi

:white_check_mark: Gradle check result for 8db8bfbc7dbc2f935f006a140ebe9ebb0facc0dd: SUCCESS

github-actions[bot] avatar Jul 16 '24 20:07 github-actions[bot]