Overview

In order to increase the messaging efficiency and enforce message schemas in baseline environments, we should research the use of binary data encoders. Advantages of serialized binary data encoders over JSON:

makes it easy to enforce message schemas to ensure different systems can properly parse messages
serialized binary data format is more memory efficient than JSON because of less overhead
broad language compatibility

We need to research the following options to determine which one best fits our needs and is it worth implementing in the baseline codebase:

Protocol buffers (Google)
Apache Thrift (Facebook)
Simple Serialize aka SSZ (Ethereum)

Questions

What is the timeline for adding binary serialization to the baseline packages?

Tasks

[ ] Compare the serial binary encoding options and report findings
[ ] Decide whether to use on of the binary encoding options or stick with JSON
[ ] Decide what priority this work should have on the baseline roadmap

Aug 03 '20 14:08 bitwiseguy

ASN.1 BER/PER is pretty standard.

for SSZ, my reference is https://github.com/ChainSafe/ssz

Aug 03 '20 16:08 sambacha

Messaging Protocols Comparison

The following contains research notes created by @Perseverance and @tkstanczak

Custom protocol

Documentation

No documentation - will have to provide it ourselves

Overview

Creating our own custom messaging protocol, defining the layout of parameters in bytes array.

Pros

Super efficient - absolutely no overhead from the protocol

Cons

Non standard - we have to provide documentation and ask the implementer to write it’s own encoder and parser.
Have to write and maintain documentation.
Have to write your own logic for field size description/limitation

JSON

Documentation

https://www.json.org/json-en.html

Overview

JSON encodes the data in key-value pairs adhering to the main JavaScript types. It is wildly used as it is very easy to serialize and deserialize.

Pros

Widely adopted - there is a parser/encoder in every remotely popular language
Easy to read

Cons

Massive overhead due to keys
Lack of support for some types - ex. Binary. Hacks (Base64) need to be implemented to support them.

Protocol Buffers

Documentation

https://developers.google.com/protocol-buffers

https://developers.google.com/protocol-buffers/docs/proto3

Overview

Serialization mechanism developed by Google. Utilizes a schema/model definition that gets compiled to the corresponding models in the corresponding language. The definitions allow for model nesting and packs data by default. The definition language resembles standard model definitions that are seen in technologies like GraphQL

Pros

Compiled to binary but abstracted away through compilation
Very efficient and optimal
Support for many languages
Support for field depriction and protocol upgrades (by adding new fields)

Cons

Needs to be compiled

Thrift

Documentation

https://thrift.apache.org/

Overview

Apache Thrift is not just a messaging protocol but also generator for client and server applications based on schema. It allows defining the message types and the business services that would be available.

Pros

Supposedly super optimal and fast

Cons

Hard language to develop
Not specific for messaging protocols

SSZ

Documentation

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md

Overview

SSZ is a compact data encoding used in Ethereum 2. It defines Merkleization mechanics for any object structures, defines a compact, binary method of serialization for objects.

Pros

Used in Ethereum, with implementations in Rust, Python, C#, Java, JavaScript.
Compact
Crypto friendly
Allows to skip items and get to the exact position to read only some data and discard everything else (faster parsing)

Cons

Limited tooling as it is only used for Ethereum 2 at the moment.
Not human readable (but core dev readable via hex ;))

Suggestions

I feel that Protocol buffers are probably our best choice for the moment.
I feel that SSZ is better for crypto / Ethereum space and it should have even better speed than protobuff. But the tooling for enterprises might be less friendly.

Aug 13 '20 19:08 bitwiseguy

Why not Avro? Kafka support is already integrated correct?

On Thu, Aug 13, 2020 at 12:28 PM Samuel Stokes [email protected] wrote:

Messaging Protocols Comparison

The following contains research notes created by @Perseverance https://github.com/Perseverance and @tkstanczak https://github.com/tkstanczak Custom protocol Documentation

No documentation - will have to provide it ourselves Overview

Creating our own custom messaging protocol, defining the layout of parameters in bytes array. Pros

Super efficient - absolutely no overhead from the protocol

Cons

Non standard - we have to provide documentation and ask the implementer to write it’s own encoder and parser.

Have to write and maintain documentation.

Have to write your own logic for field size description/limitation

JSON Documentation

https://www.json.org/json-en.html Overview

JSON encodes the data in key-value pairs adhering to the main JavaScript types. It is wildly used as it is very easy to serialize and deserialize. Pros

Widely adopted - there is a parser/encoder in every remotely popular language

Easy to read

Cons

Massive overhead due to keys

Lack of support for some types - ex. Binary. Hacks (Base64) need to be implemented to support them.

Protocol Buffers Documentation

https://developers.google.com/protocol-buffers

https://developers.google.com/protocol-buffers/docs/proto3 Overview

Serialization mechanism developed by Google. Utilizes a schema/model definition that gets compiled to the corresponding models in the corresponding language. The definitions allow for model nesting and packs data by default. The definition language resembles standard model definitions that are seen in technologies like GraphQL Pros

Compiled to binary but abstracted away through compilation

Very efficient and optimal

Support for many languages

Support for field depriction and protocol upgrades (by adding new fields)

Cons

Needs to be compiled

Thrift Documentation

https://thrift.apache.org/ Overview

Apache Thrift is not just a messaging protocol but also generator for client and server applications based on schema. It allows defining the message types and the business services that would be available. Pros

Supposedly super optimal and fast

Cons

Hard language to develop

Not specific for messaging protocols

SSZ Documentation

https://github.com/ethereum/eth2.0-specs/blob/dev/ssz/simple-serialize.md Overview

SSZ is a compact data encoding used in Ethereum 2. It defines Merkleization mechanics for any object structures, defines a compact, binary method of serialization for objects. Pros

Used in Ethereum, with implementations in Rust, Python, C#, Java, JavaScript.

Compact

Crypto friendly

Allows to skip items and get to the exact position to read only some data and discard everything else (faster parsing)

Cons

Limited tooling as it is only used for Ethereum 2 at the moment.

Not human readable (but core dev readable via hex ;))

Suggestions

I feel that Protocol buffers are probably our best choice for the moment.

I feel that SSZ is better for crypto / Ethereum space and it should have even better speed than protobuff. But the tooling for enterprises might be less friendly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ethereum-oasis/baseline/issues/192#issuecomment-673667704, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2D4LCYREIIN4WLSWZPJXTSAQ5FRANCNFSM4PTMLGGA .

Aug 14 '20 17:08 sambacha

@Kasshern @skosito @Ybittan @biscuitdey This discussion might be relevant for our work on the SRI. Keeping it open so that we can discuss.

Jul 28 '22 08:07 ognjenkurtic

baseline
baseline copied to clipboard

Research binary data encoders for baseline messages

Overview

Questions

Tasks

Messaging Protocols Comparison

Custom protocol

Documentation

Overview

Pros

Cons

JSON

Documentation

Overview

Pros

Cons

Protocol Buffers

Documentation

Overview

Pros

Cons

Thrift

Documentation

Overview

Pros

Cons

SSZ

Documentation

Overview

Pros

Cons

Suggestions

baseline baseline copied to clipboard

Research binary data encoders for baseline messages

Overview

Questions

Tasks

Messaging Protocols Comparison

Custom protocol

Documentation

Overview

Pros

Cons

JSON

Documentation

Overview

Pros

Cons

Protocol Buffers

Documentation

Overview

Pros

Cons

Thrift

Documentation

Overview

Pros

Cons

SSZ

Documentation

Overview

Pros

Cons

Suggestions

baseline
baseline copied to clipboard