OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[RFC] Protobuf in OpenSearch

Open saratvemulapalli opened this issue 2 years ago • 32 comments
trafficstars

Inspiration

Plugins are very tightly version coupled with OpenSearch https://github.com/opensearch-project/OpenSearch/issues/1707 and relaxing them to work for patch versions is still in the works.

While working on extensions #2447 we really wanted to support multiple versions (including major/minor/patch) of OpenSearch with one OpenSearch SDK[1].

Proposal

Exploring for opensource solutions, Protobuf[2] which is built by Google and is widely adopted for serializing/de-serializing and used as RPC. It was built out of the box to support forward and backward compatibility seamlessly.

With an initial experiment of integrating protobuf in OpenSearch/Extensions https://github.com/opensearch-project/opensearch-sdk-java/issues/414#issuecomment-1471010441, we see:

  • an extension working with multiple versions of OpenSearch (Backward and forward compatibility)
  • Simple human readable message contracts from .proto definitions.
  • Generated classes for readers and writers in any language of choice, an important factor for offering OpenSearch SDK in different languages.

For extensions, protobuf solves a lot of problems but has a tiny overhead for serialization/de-serialization over existing OpenSearch's StreamInput StreamOutput

Next Steps

With the learnings we have seen in SDK/Extensions, there is more potential for Protobuf integration in OpenSearch and would like to propose offering Protobuf as a new type:

  • Transport Layer: Implement StreamInput, StreamOutput with protobuf serializer/de-serializers. This will help offer another type within the transport ecosystem similar to ByteBufferStreamInput[3] etc. This would seamlessly plugin into Writable[4] interface which is used across the repo for transporting custom messages.

Adding in transport will enable communication between OpenSearch nodes to have significant benefits in performance and seamless versioning compatibility. @nknize already started making changes to enable this with restructuring XContent https://github.com/opensearch-project/OpenSearch/pull/6470

  • Rest Layer: Implement new XContent.Type to add protobuf as an option. Historically converting Json <-> Protobuf has performance implications but for transporting on the Rest Layer with clients, OpenSearch Dashboards and ingestion tools might have benefit when talking over binary format. (Yet to experiment)

Additionally, having protobuf at Rest layer will unblock OpenSearch to support gRPC (if we choose this path).

FAQ

Q Is Protobuf higher performant? A. We moved 2 APIs a. Cat Nodes b. _search, both APIs with protobuf had atleast 20% better performance compared to native protocol, and we see linear improvements with increase in cluster size.

Q. What are the benchmark numbers for search ? A. See OpenSearch benchmark results for querying with Protobuf : https://github.com/opensearch-project/OpenSearch/issues/10684#issuecomment-1876077885

Q. What are the benchmark numbers for Cat Nodes (Operational APIs) A. See benchmarking results : https://github.com/opensearch-project/OpenSearch/issues/6844#issuecomment-1742250229

Q. Is Protobuf in OpenSearch necessary to support GRPC A. Protobuf works at transport layer, while GRPC is a layer 7 protocol. GRPC internally uses protobuf as transport which makes it a dependency. We presume there will be significant performance benefits with GRPC as data would be transmitted binary instead of JSON.

cc: @VachaShah @prudhvigodithi

[1] https://github.com/opensearch-project/opensearch-sdk-java [2] https://protobuf.dev/overview/ [3] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/io/stream/ByteBufferStreamInput.java [4] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/io/stream/Writeable.java

saratvemulapalli avatar Mar 27 '23 18:03 saratvemulapalli

Super excited about this!!! It's non-invasive which is fantastic! Enables us to further refactor the transport layer for extensions to support this.

This has my full endorsement! Nice work!

nknize avatar Mar 27 '23 19:03 nknize

@saratvemulapalli thanks for the RFC, how beneficial is gRPC for extensions actually? I am not against it but I have doubts we have a compelling use case now for it:

  • yes, we could use gRPC as the transport between core and extensions, but it looks like to be limited to just a few request / response pairs (at least for now), is it worth it?
  • extensions are pretty much "do whatever you want", but our clients only support HTTP right now [1], we would force people to do gRPC and HTTP (REST actions) at the same time, looks like overhead?

The unknown part for me is the role of security plugin, does it mean it has to support gRPC as well to perform any meaningful checks in gRPC world?

[1] https://github.com/opensearch-project/opensearch-clients/issues/55

reta avatar Mar 27 '23 20:03 reta

Assuming extensions get very chatty transport improvements will be increasingly useful, but I am reading this proposal as 1) make transport pluggable today in OpenSearch, and 2) implement gRPC as an option. I think this has the potential to significantly improve node-to-node communication today passing around all kinds of cluster state. Let's see by how much!

dblock avatar Mar 27 '23 22:03 dblock

thanks @reta for taking a look.

how beneficial is gRPC for extensions actually? I am not against it but I have doubts we have a compelling use case now for it:

I dont know how much gRPC will help extensions (yet, we'll learn more) but with protobuf I've listed down the benefits we see so far for extensions. Infact extensions will be similar to clients for Rest APIs as they use clients internally.

extensions are pretty much "do whatever you want", but our clients only support HTTP right now [1], we would force people to do gRPC and HTTP (REST actions) at the same time, looks like overhead?

We definitely don't want to force using gRPC, as an example with clients. It should be just another option which we will offer in addition to Json based Rest APIs. I am hoping if we could transmit information in binary (protobuf) format, it would help in performance. Obviously we have to write seamless serializers/de-serializers for each API to translate this data into Java objects for OpenSearch APIs to understand them.

Mostly with the RFC, I am looking for feedback to see if there is something fundamental I am missing vs give it a shot and lets get some numbers to decide if this worth chasing in the longer term.

saratvemulapalli avatar Mar 27 '23 22:03 saratvemulapalli

I am reading this proposal as 1) make transport pluggable today in OpenSearch,

Transport is already pluggable today (e.g., plugins/transport-nio)

...and 2) implement gRPC as an option.

Sort of. We would implement protobuf as an option (just like CBOR, SMILE, YAML, and JSON).

The only difference here is that we would add an additional ProtobufStreamInput extends StreamInput, ProtobufStreamOutput extends StreamOutput concrete implementation (e.g., in o.common.xcontent.protobuf) that overrides the default StreamInput StreamOutput marshall/unmarshall logic w/ the Protobuf implementation and the PROTOBUF transport option would use this instead of the default StreamInput/StreamOutput.

This is why I've been pushing the XContent refactor. The next PR will refactor the StreamInput/StreamOutput classes into a library and the xcontent/protobuf implementation can provide its concrete implementation that uses its own serialize logic. It's an elegant approach that gets us even further toward modularizing massive amounts of code out of :server into separable libraries (thus moving closer toward jigsaw modularization).

I dont know how much gRPC will help extensions...

Exactly. Let's keep this simple and focus just on protobuf as an optional transport (that as a bonus supports versioning!) than expanding the aperture to say it's a silver bullet for extensions (progress not perfection).

In a followup (after vetting and going GA) we can discuss protobuf as the default transport format over JSON. I think it's clearly a direction worth considering.

nknize avatar Mar 27 '23 23:03 nknize

Thanks a lot @saratvemulapalli , I was a bit confused by gRPC mentions but @nknize clearly clarified that we are only talking about serialization mechanism (protobufs), and not the communication protocol change (gRPC, at least for now). Thanks.

reta avatar Mar 28 '23 00:03 reta

Thanks @reta. I probably took you off mentioning about gRPC (sorry about that) but mostly intended this issue for Protobuf and put in list of opportunities this will enable and one of them is gRPC (if choose to make it happen down the line). I've updated the RFC to clarify this.

Tagging @peternied @cwperks @wbeckler @seanneumann who had thoughts/feedback.

saratvemulapalli avatar Mar 28 '23 22:03 saratvemulapalli

I'm not quite as up-to-speed on gRPC and other options as the more experienced folk up above, but I will say I'm all for protobuf because:

  • generates code in multiple languages. If we design SDK using text-based protobuf files, we can have Java code on OpenSearch side read by <insert your language here> code on SDK. Combine this "spec based" transport call with language clients doing spec-based REST calls and all we have left is to crack the "X-Content" nut which if I read earlier in this thread, is also a possibility.
  • is independent enough that third parties can develop serializations outside of our framework, and just send bytes around via extensions. Our proxy action framework doesn't care what the bytes are. Right now our implementations use the Writeable serialization but that's just 'cause it's available. This could be huge for compression and future support of injestion and/or analysis plugins that are bandwidth limited.

dbwiddis avatar Mar 30 '23 02:03 dbwiddis

Great proposal, I'm all for starting as an optional transport layer and we can figure out where it best benefits the project.

peternied avatar Apr 07 '23 22:04 peternied

All in for protobuf! Great proposal @saratvemulapalli! In the future it would be great to see JSON response with versioning support for extension points APIs using protobuf (No more of XContent!).

owaiskazi19 avatar Apr 11 '23 22:04 owaiskazi19

Thanks for the proposal. @nknize @reta @VachaShah Just catching up have we evaluated ION as an alternative. Do we think we can run into problems with large data. The one benefit I see with ION is that it doesn't mandate a data schema, it's optional

Bukhtawar avatar May 26 '23 05:05 Bukhtawar

@Bukhtawar I don't recall there was any evaluation being done

reta avatar May 26 '23 16:05 reta

Another one that I guess should be considered in favour of low garbage https://github.com/OpenHFT/Chronicle-Wire

Bukhtawar avatar May 27 '23 17:05 Bukhtawar

Thanks @Bukhtawar for the feedback. We haven't really looked at other alternatives as mostly we started to solve seamless cross version support first and saw the opportunity with OpenSearch as a whole. We definitely understand with large data[1] protobuf is not recommended. @VachaShah is actively working on this and we'll benchmark couple of alternatives including ION. If you have other suggestions let us know.

[1] https://protobuf.dev/programming-guides/techniques/#large-data

saratvemulapalli avatar Jun 05 '23 19:06 saratvemulapalli

The ones I have seen quite often in place of protobuf, if it could help, are:

  • Kryo (https://github.com/EsotericSoftware/kryo)
  • Apache Avro (https://github.com/apache/avro)
  • Apache Thrift (https://github.com/apache/Thrift)
  • MessagePack (https://msgpack.org/index.html)
  • Cap'n Proto (https://capnproto.org/)

reta avatar Jun 05 '23 20:06 reta

I did a POC (see draft PR #9097 for the POC code changes for changing the API request response de/serialization and between the nodes) with protobuf for _cat/nodes API and got the time per request for the protobuf variant in comparison to the original _cat/nodes API. The following improvements were noted for various clusters:

2 nodes cluster

Average improvement of 16.14%

image image (1)

5 nodes cluster

Average improvement of 18.11%

image (2) image (3)

10 nodes cluster

Average improvement of 21.35%

image (4) image (5)

15 nodes cluster

Average improvement of 31.39%

image (6) image (7)

VachaShah avatar Oct 02 '23 00:10 VachaShah

This is significant. Can we ship this?

dblock avatar Oct 02 '23 20:10 dblock

Also another great data point from @dbwiddis and @dblock, while they were trying to write https://github.com/opensearch-project/opensearch-sdk-py, all extension interfaces were autogenerated in python and seamlessly could work over transport with OpenSearch de-serializing this data using Protobuf java.

saratvemulapalli avatar Oct 02 '23 23:10 saratvemulapalli

This is significant. Can we ship this?

Thank you @dblock! I am working on getting the changes from the POC in #9097 to merge in the repo. @saratvemulapalli and I are also working on getting the numbers for APIs like search.

VachaShah avatar Oct 02 '23 23:10 VachaShah

Thank you @dblock! I am working on getting the changes from the POC in #9097 to merge in the repo.

@VachaShah the numbers are very convincing, one thing we should keep in mind (which you definitely know about) is how to support migration from 2.x to 3.x when some nodes will talk old protocol and new ones will use Protobuf (or in general, any new protocol). In continuation to this, it may not be feasible to migrate all transport actions to Protobuf in one shot so even in 3.x we would need to maintain this mix of transport protocols.

reta avatar Oct 03 '23 13:10 reta

As a strategy, I would 1) support multiple protocols in 2.x in a way where we can migrate actions one-by-one, 2) rip out the existing transport protocol implementation in 3.0 and fully replace it with Protobuf. IMO, we only need an upgrade path in which a 2.x node can do just enough transport protocol to upgrade itself to protobuf and then never look back.

dblock avatar Oct 03 '23 15:10 dblock

@reta we might have a way to get inter-operable protocols as protobuf can write and readfrom bytesArray but it be could lots of manual effort to get all actions into the new protocol :/.

saratvemulapalli avatar Oct 03 '23 16:10 saratvemulapalli

It would be worth to see what actions can benefit the most from protobuf and as Sarat mentioned, the 2 protocols can co-exist with each other with some effort so we can make the upgrade scenarios work.

VachaShah avatar Oct 04 '23 19:10 VachaShah

Is the current effort on this RFC being done on some feature branch that I can follow and take a look at?

austintlee avatar Oct 06 '23 18:10 austintlee

Hi @austintlee, currently there is a draft PR #9097 with the POC and the changes from the draft PR would be PRed out incrementally.

VachaShah avatar Oct 06 '23 21:10 VachaShah

Great proposal! Do we know what's the reason behind this improvement? Is it due to reduced CPU time during serialization, or reduced network time due to difference in payload size?

ketanv3 avatar Oct 10 '23 16:10 ketanv3

Is there already commitment to protobuf without considering alternatives per @reta?

The ones I have seen quite often in place of protobuf, if it could help, are:

  • Kryo (https://github.com/EsotericSoftware/kryo)
  • Apache Avro (https://github.com/apache/avro)
  • Apache Thrift (https://github.com/apache/Thrift)
  • MessagePack (https://msgpack.org/index.html)
  • Cap'n Proto (https://capnproto.org/)

Avro/Thrift have been around for quite some time. I'd like to see some comparison of some of these.

macohen avatar Oct 10 '23 16:10 macohen

@macohen I think the idea is that we 1) make the transport protocol pluggable, 2) ship protobuf support as an experimental feature with an upgrade path, 3) offer other protocols, 4) make the best one the default.

dblock avatar Oct 10 '23 21:10 dblock

Great proposal! Do we know what's the reason behind this improvement? Is it due to reduced CPU time during serialization, or reduced network time due to difference in payload size?

We believe most of the benefits are with data compression, we couldn't really analyze frame graphs due to multiple threads. Protobuf ended up streaming fewer bytes for the same payload and efficient in serializing/de-serializing data.

saratvemulapalli avatar Dec 18 '23 19:12 saratvemulapalli

@saratvemulapalli @VachaShah can you please confirm if this change can be included in 2.x without breaking existing API? Basically can this change be added in a backward compatible manner in 2.x line?

We are evaluating if this change requires 3.0 release or can be included in 2.x line so need your inputs.

bbarani avatar Feb 06 '24 19:02 bbarani