OpenSearch
                                
                                 OpenSearch copied to clipboard
                                
                                    OpenSearch copied to clipboard
                            
                            
                            
                        QueryFetchSearchResult as a proto message and node-to-node communication with protobuf
Description
This PR is part 1 of the changes and POC done in #10684. It contains the following changes:
- QueryFetchSearchResult as a proto message.
- A new interface for serialization and deserialization using Protobuf - ProtobufWriteable.
- Node-to-node communication with protobuf for a basic response QueryFetchSearchResultduring a search API.
- Feature flag for the above changes.
- Supporting framework for adding protobuf.
Incoming changes in this PR:
- [ ] Add tests for protobuf messages and node-to-node communication for QueryFetchSearchResult.
Next part of changes:
- [ ] Make the feature flag dynamic so that it can be accessed via cluster settings.
- [ ] Add support for Aggregations and some remaining fields.
- [ ] Abstractions which support making the transport layer pluggable.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
- [ ] New functionality includes testing.
- [x] All tests pass
 
- [ ] New functionality has been documented.
- [ ] New functionality has javadoc added
 
- [x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
- [x] Commits are signed per the DCO using --signoff
- [x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
- [x] Public documentation issue/PR created
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Compatibility status:
Checks if related components are compatible with change 324c58f
Incompatible components
Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git]
Skipped components
Compatible components
Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]
:x: Gradle check result for 5c92d5c45f6ad83fae14d582196d52715ea4f4e1: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@VachaShah ever more endorsements than @dblock (:-D), but on general note: the Protobuf support (as of this pull request) is very much entangled with the transport APIs and implementation (mostly every layer has to be aware of it). I think this is a good opportunity to make breaking changes so adding new serialization protocols could be more or less simple (let say we would need to add FlatBuffers, you name it, what would be the cost of that)
:x: Gradle check result for 2f4273bf9a24200266694cabef5a6d4b9a346e0a: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 3a462d029aaf05311fd30397388acd3b02b30904: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume. The other parts like pluggable transport are coming in later. This is to make sure that the changes are incremental and we are able to deliver improvements as part of these PRs. LMK what you guys think :)
@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume.
Thanks @VachaShah , absolutely, we could improve iteratively, however there are few decisions even in this process we need to make right - this would be a foundation to built upon. I would like to highlight only two concerns which I think we have to address:
- multiplexing messages (inbound handler) in different serialization formats
- serialization formats leaking into domain (fe. implementing ProtobufWriteable)
@dblock what do you think? PS: @VachaShah if you need a hand, please feel free to let us (and any maintianer) know!
:white_check_mark: Gradle check result for 4e50b8569248ab8e5ea04f39d7572f49d61ba628: SUCCESS
Codecov Report
Attention: Patch coverage is 36.36364% with 539 lines in your changes are missing coverage. Please review.
Project coverage is 71.35%. Comparing base (
b15cb0c) to head (324c58f). Report is 310 commits behind head on main.
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11910      +/-   ##
============================================
- Coverage     71.42%   71.35%   -0.07%     
- Complexity    59978    60545     +567     
============================================
  Files          4985     5040      +55     
  Lines        282275   285475    +3200     
  Branches      40946    41345     +399     
============================================
+ Hits         201603   203699    +2096     
- Misses        63999    64913     +914     
- Partials      16673    16863     +190     
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@dblock and @reta Thank you so much for the comments! Absolutely, the idea is to make transport abstract enough that it is pluggable so anything that comes after protobuf is easy to add. The first goal however is to add protobuf as part of search API and deliver it as an experimental feature for users to consume.
Thanks @VachaShah , absolutely, we could improve iteratively, however there are few decisions even in this process we need to make right - this would be a foundation to built upon. I would like to highlight only two concerns which I think we have to address:
- multiplexing messages (inbound handler) in different serialization formats
- serialization formats leaking into domain (fe. implementing
ProtobufWriteable)@dblock what do you think? PS: @VachaShah if you need a hand, please feel free to let us (and any maintianer) know!
Makes sense! Just so I understand correctly:
- For the inbound handler concern, in the current architecture, when the bytes come in the channel, InboundPipelinedoes ahandleBytesand then forwards the fragments to the InboundHandler in form of theInboundMessage. With the protobuf change, the InboundPipeline takes in the bytes from the channel and then understands if it is protobuf or regular message and forwards the fragments accordingly. Are you suggesting that this forward fragment logic should be moved further into the hierarchy?
- Can you elaborate on the serialization formats leaking into domain?
And as always, really appreciate your feedback and help! :)
Thanks @VachaShah
- Are you suggesting that this forward fragment logic should be moved further into the hierarchy?
I would say it this way - the network engine decode bytes and hands it over to InboundHandler::messageReceived (eventually). The message is still bytes (InputStream) and will be decoded to request by RequestHandlerRegistry. Now, the handler could support multiple protocols now, and I think that would be the right place to tackle.
- Can you elaborate on the serialization formats leaking into domain?
Sure,  here is the example public final class SearchHit implements Writeable, ToXContentObject, Iterable<DocumentField>, ProtobufWriteable { - the ProtobufWriteable has to be implemented explicitly by the domain class, this is high coupling between class and low level serialization.
:x: Gradle check result for 1a59c1b4fe5480cd7001926a72a1cdda3a00b406: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e355a4664716aca95566311791000b830eae2ad0: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for d843917520b6f9d2c40fe5623fde45d868975b6b: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e30b9b25adcc03d5cec40ebc01d5789071a6d542: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 12921c8125bb319ca3da9d7c9fc509293458041c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:grey_exclamation: Gradle check result for 12921c8125bb319ca3da9d7c9fc509293458041c: UNSTABLE
- TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.
@reta @saratvemulapalli I added some abstractions so that inboundMessage does not have to have another path for protobuf. Let me know what you guys think.
:x: Gradle check result for 38662f6a1f7c2f7188e6f6d8fca01951f05e9dee: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 288a943fc0d0e53f2c9db312729aa9931d51084c: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Thank you @reta and @saratvemulapalli for the comments! I will work on these and update the PR incrementally.
:x: Gradle check result for abc32a88896ffb3bae04642da45a72aecef3e9e9: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 83be522cc38c1112b8924e26f56bcfa7022c5c06: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 3cac405e8ac827b6b1ba737f0ba0443e205e3ea3: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I also tried it with keeping the Header separate and the content as protobuf but the current logic does a lot of things with Header (like reading header buffers and aggregating the messages from the stream)which I think is not needed to be done for protobuf messages.
I am working on other comments but wanted to get feedback about the protocol detection part. Thanks!
:x: Gradle check result for 235aa4c83aa0c22e31f5fb78ec03d72a0ce5bed8: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I
Thanks @VachaShah , I like this direction (and we already have similar prefix for native protocol), thank you.
:x: Gradle check result for a2ef064bfcd5c756f38b84140e76a6670fbfd7a6: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@reta and @saratvemulapalli I added recent commits which improve protocol detection by adding magic bytes to protobuf messages during transport serialization. LMK what you guys think! I
Thanks @VachaShah , I like this direction (and we already have similar prefix for native protocol), thank you.
Thank you @reta! I will continue with this and am working on addressing other comments. Will ping on the PR once ready!
:x: Gradle check result for f7ed896137f08a03530590afe1c9224a437a9359: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?