OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

QueryFetchSearchResult as a proto message and node-to-node communication with protobuf

Open VachaShah opened this issue 1 year ago • 56 comments

Description

This PR is part 1 of the changes and POC done in #10684. It contains the following changes:

  1. QueryFetchSearchResult as a proto message.
  2. A new interface for serialization and deserialization using Protobuf - ProtobufWriteable.
  3. Node-to-node communication with protobuf for a basic response QueryFetchSearchResult during a search API.
  4. Feature flag for the above changes.
  5. Supporting framework for adding protobuf.

Incoming changes in this PR:

  • [ ] Add tests for protobuf messages and node-to-node communication for QueryFetchSearchResult.

Next part of changes:

  • [ ] Make the feature flag dynamic so that it can be accessed via cluster settings.
  • [ ] Add support for Aggregations and some remaining fields.
  • [ ] Abstractions which support making the transport layer pluggable.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ ] New functionality includes testing.
    • [x] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • [x] Commits are signed per the DCO using --signoff
  • [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [x] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

VachaShah avatar Jan 17 '24 20:01 VachaShah

Compatibility status:

Checks if related components are compatible with change 324c58f

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

github-actions[bot] avatar Jan 17 '24 20:01 github-actions[bot]

:x: Gradle check result for 5c92d5c45f6ad83fae14d582196d52715ea4f4e1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 17 '24 20:01 github-actions[bot]

@VachaShah ever more endorsements than @dblock (:-D), but on general note: the Protobuf support (as of this pull request) is very much entangled with the transport APIs and implementation (mostly every layer has to be aware of it). I think this is a good opportunity to make breaking changes so adding new serialization protocols could be more or less simple (let say we would need to add FlatBuffers, you name it, what would be the cost of that)

reta avatar Jan 17 '24 21:01 reta

:x: Gradle check result for 2f4273bf9a24200266694cabef5a6d4b9a346e0a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 18 '24 00:01 github-actions[bot]

:x: Gradle check result for 3a462d029aaf05311fd30397388acd3b02b30904: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 18 '24 01:01 github-actions[bot]

@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume. The other parts like pluggable transport are coming in later. This is to make sure that the changes are incremental and we are able to deliver improvements as part of these PRs. LMK what you guys think :)

VachaShah avatar Jan 18 '24 01:01 VachaShah

@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume.

Thanks @VachaShah , absolutely, we could improve iteratively, however there are few decisions even in this process we need to make right - this would be a foundation to built upon. I would like to highlight only two concerns which I think we have to address:

  • multiplexing messages (inbound handler) in different serialization formats
  • serialization formats leaking into domain (fe. implementing ProtobufWriteable)

@dblock what do you think? PS: @VachaShah if you need a hand, please feel free to let us (and any maintianer) know!

reta avatar Jan 18 '24 01:01 reta

:white_check_mark: Gradle check result for 4e50b8569248ab8e5ea04f39d7572f49d61ba628: SUCCESS

github-actions[bot] avatar Jan 18 '24 04:01 github-actions[bot]

Codecov Report

Attention: Patch coverage is 36.36364% with 539 lines in your changes are missing coverage. Please review.

Project coverage is 71.35%. Comparing base (b15cb0c) to head (324c58f). Report is 310 commits behind head on main.

Files Patch % Lines
...search/serializer/SearchHitProtobufSerializer.java 0.00% 107 Missing :warning:
...earch/serializer/SearchHitsProtobufSerializer.java 0.00% 90 Missing :warning:
...nt/serializer/DocumentFieldProtobufSerializer.java 0.00% 86 Missing :warning:
...sport/protobufprotocol/ProtobufInboundMessage.java 46.91% 41 Missing and 2 partials :warning:
...sport/protobufprotocol/ProtobufMessageHandler.java 46.05% 37 Missing and 4 partials :warning:
...ensearch/action/search/SearchTransportService.java 10.71% 24 Missing and 1 partial :warning:
...org/opensearch/search/fetch/FetchSearchResult.java 28.57% 18 Missing and 2 partials :warning:
...org/opensearch/search/query/QuerySearchResult.java 86.71% 12 Missing and 7 partials :warning:
...main/java/org/opensearch/common/lucene/Lucene.java 0.00% 16 Missing :warning:
.../action/ProtobufActionListenerResponseHandler.java 0.00% 15 Missing :warning:
... and 14 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11910      +/-   ##
============================================
- Coverage     71.42%   71.35%   -0.07%     
- Complexity    59978    60545     +567     
============================================
  Files          4985     5040      +55     
  Lines        282275   285475    +3200     
  Branches      40946    41345     +399     
============================================
+ Hits         201603   203699    +2096     
- Misses        63999    64913     +914     
- Partials      16673    16863     +190     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 18 '24 04:01 codecov[bot]

@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume.

Thanks @VachaShah , absolutely, we could improve iteratively, however there are few decisions even in this process we need to make right - this would be a foundation to built upon. I would like to highlight only two concerns which I think we have to address:

  • multiplexing messages (inbound handler) in different serialization formats
  • serialization formats leaking into domain (fe. implementing ProtobufWriteable)

@dblock what do you think? PS: @VachaShah if you need a hand, please feel free to let us (and any maintianer) know!

Makes sense! Just so I understand correctly:

  • For the inbound handler concern, in the current architecture, when the bytes come in the channel, InboundPipeline does a handleBytes and then forwards the fragments to the InboundHandler in form of the InboundMessage. With the protobuf change, the InboundPipeline takes in the bytes from the channel and then understands if it is protobuf or regular message and forwards the fragments accordingly. Are you suggesting that this forward fragment logic should be moved further into the hierarchy?
  • Can you elaborate on the serialization formats leaking into domain?

And as always, really appreciate your feedback and help! :)

VachaShah avatar Jan 18 '24 19:01 VachaShah

Thanks @VachaShah

  • Are you suggesting that this forward fragment logic should be moved further into the hierarchy?

I would say it this way - the network engine decode bytes and hands it over to InboundHandler::messageReceived (eventually). The message is still bytes (InputStream) and will be decoded to request by RequestHandlerRegistry. Now, the handler could support multiple protocols now, and I think that would be the right place to tackle.

  • Can you elaborate on the serialization formats leaking into domain?

Sure, here is the example public final class SearchHit implements Writeable, ToXContentObject, Iterable<DocumentField>, ProtobufWriteable { - the ProtobufWriteable has to be implemented explicitly by the domain class, this is high coupling between class and low level serialization.

reta avatar Jan 18 '24 20:01 reta

:x: Gradle check result for 1a59c1b4fe5480cd7001926a72a1cdda3a00b406: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 21 '24 08:01 github-actions[bot]

:x: Gradle check result for e355a4664716aca95566311791000b830eae2ad0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 23 '24 20:01 github-actions[bot]

:x: Gradle check result for d843917520b6f9d2c40fe5623fde45d868975b6b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 23 '24 21:01 github-actions[bot]

:x: Gradle check result for e30b9b25adcc03d5cec40ebc01d5789071a6d542: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 23 '24 22:01 github-actions[bot]

:x: Gradle check result for 12921c8125bb319ca3da9d7c9fc509293458041c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 23 '24 23:01 github-actions[bot]

:grey_exclamation: Gradle check result for 12921c8125bb319ca3da9d7c9fc509293458041c: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] avatar Jan 24 '24 00:01 github-actions[bot]

@reta @saratvemulapalli I added some abstractions so that inboundMessage does not have to have another path for protobuf. Let me know what you guys think.

VachaShah avatar Jan 24 '24 19:01 VachaShah

:x: Gradle check result for 38662f6a1f7c2f7188e6f6d8fca01951f05e9dee: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 26 '24 08:01 github-actions[bot]

:x: Gradle check result for 288a943fc0d0e53f2c9db312729aa9931d51084c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Jan 28 '24 08:01 github-actions[bot]

Thank you @reta and @saratvemulapalli for the comments! I will work on these and update the PR incrementally.

VachaShah avatar Feb 02 '24 00:02 VachaShah

:x: Gradle check result for abc32a88896ffb3bae04642da45a72aecef3e9e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 02 '24 01:02 github-actions[bot]

:x: Gradle check result for 83be522cc38c1112b8924e26f56bcfa7022c5c06: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 08 '24 19:02 github-actions[bot]

:x: Gradle check result for 3cac405e8ac827b6b1ba737f0ba0443e205e3ea3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 08 '24 20:02 github-actions[bot]

@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I also tried it with keeping the Header separate and the content as protobuf but the current logic does a lot of things with Header (like reading header buffers and aggregating the messages from the stream)which I think is not needed to be done for protobuf messages.

I am working on other comments but wanted to get feedback about the protocol detection part. Thanks!

VachaShah avatar Feb 14 '24 06:02 VachaShah

:x: Gradle check result for 235aa4c83aa0c22e31f5fb78ec03d72a0ce5bed8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 14 '24 07:02 github-actions[bot]

@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I

Thanks @VachaShah , I like this direction (and we already have similar prefix for native protocol), thank you.

reta avatar Feb 16 '24 22:02 reta

:x: Gradle check result for a2ef064bfcd5c756f38b84140e76a6670fbfd7a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 23 '24 23:02 github-actions[bot]

@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I

Thanks @VachaShah , I like this direction (and we already have similar prefix for native protocol), thank you.

Thank you @reta! I will continue with this and am working on addressing other comments. Will ping on the PR once ready!

VachaShah avatar Feb 26 '24 17:02 VachaShah

:x: Gradle check result for f7ed896137f08a03530590afe1c9224a437a9359: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Feb 28 '24 01:02 github-actions[bot]