datproject-discussions icon indicating copy to clipboard operation
datproject-discussions copied to clipboard

Investigating message abstraction layer implementation options

Open aschrijver opened this issue 7 years ago • 0 comments

(NOTE This showcase is part 2b of Positioning, vision and future direction of the Dat Project)

Before reading on: These are just initial thoughts, any feedback is greatly appreciated!

Options

The preferred approach currently is to leave hypercore alone and write the message abstraction layer on top of it. Presumably biggest concern here is:

  • retain backwards compatibility, avoid breaking changes

I haven't worked with protocol-buffers, but looking at definition of schema.proto in hypercore-protocol and holding that against the vert.x approach to messaging, I see following options:

  1. Frames only, with existing messages redefined as Frame Types 1a. Only a subset of existing messages are actual Frame Types
  2. Just add a single Message message to existing ones defined
  3. Don't touch hypercore-protocol, hypercore, implement on top of them

1. Everything is a Frame

In this setup:

  • at root level there are only Frame messages
  • existing messages are redefined as FrameType
  • Frame.type indicates purpose, determines Body semantics

Impact (first impression):

  • hypercore-protocol
    • update schema.proto (small)
    • update specification / design docs (medium)
  • hypercore-messaging (tiny satellite module)
    • message creation (small)
    • message handling (medium)
  • hypercore:
    • integrate hypercore-messaging (small)
    • refactor to deal with Frames (small)
    • API extension for dealing with header, body (small)
    • (optional) handle backwards-compatibility (medium)
  • hyperdrive (for example):
    • refactor to messaging, incorporate hypercore API changes (medium)
    • (i.e. file / chunk logic creates 'File' / Chunk messages of frame type Data)

Pro's:

  • messaging is natively supported, a core concept
  • no backwards version-compatibility issues in hypercore-protocol. 2 ways to avoid:
    • have existing (old) message definitions at root level or import them, and just add Frame
    • have 2 .proto files and define a hypercore.proto.messaging package namespace in one
  • easier to ensure / guarantee interoperability of decentralized apps
  • steers implementers, broader community to best-practice approach regarding messaging
  • (would be easy to write a bridge and plug into the polyglot vert.x ecosystem, gain access to the JVM)

Cons:

  • not all existing messages may be good candidate frame types (see option 1a)
  • backwards-compatibility still requires handling in downstream projects (best candidate is hypercore)

The schema.proto may look something like this:

// add package name to discern from the old format that must still be supported for a while

package hypercore.proto.messaging

// or keep original messages at root level, retain backwards compatibility with one .proto
// alternatively the old specification format can be imported

message Fragment {
    // type=0, should be the first message sent on a channel
    message Feed { ... }

    // type=1, overall connection handshake. should be send just after the feed message on the first channel only
    message Handshake { ... }

    // type=2, message indicating state changes etc.
    message Info { ... }

    // type=3, what do we have?
    message Have { .., }

    // type=4, what did we lose?
    message Unhave { ... }

    // type=5, what do we want? remote should start sending have messages in this range
    message Want { ... }

    // type=6, what don't we want anymore?
    message Unwant { ... }

    // type=7, ask for data
    message Request { ... }

    // type=8, cancel a request
    message Cancel { ... }

    // type=9, get some data
    message Data { ... }

    enum FrameType {
        Feed = 0;    // the first message, also default enum value
        Handshake = 1;
        Info = 2;
        Have = 3;
        Unhave = 4;
        Want = 5;
        Unwant = 6;
        Request = 7;
        Cancel = 8;
        Data = 9;
    }

    required FrameType type;

    // either define a single header format, or support multiple alternatives in 'oneof' construct, e.g.
    //
    // - DatDefaultHeaderFormat (default format holding only dat-supported attributes)
    // - KeyValueHeaderFormat (user-extensible map of header attributes)
    // - CustomHeaderFormat (e.g. community-contributed JsonSchemaHF, JsonLdHF, etc.)

    message Header { ... }

    // probably include some more Frame fields here

    // the body payload that depends on the frame type
    oneof Body {
        Feed = 0;
        Handshake = 1;
        Info = 2;
        Have = 3;
        Unhave = 4;
        Want = 5;
        Unwant = 6;
        Request = 7;
        Cancel = 8;
        Data = 9;    // maybe rename to Message, or Payload
    }
}

Notes:

  • field changes wrt current messages may make sense (e.g. promoting to Frame level, removing)
  • Data.value would be where the message body is (may be defined as type Any)
  • Body payload field layouts must be unique for each frame type for oneof to work (presumably)

1a - Some Frame Types, some message types

Option 1 may be a very naive design, as it assumes all current message types are natural candidate Frame Types, however:

  1. Some (or all) might be implemented as message types instead using frame type Data
  • E.g. Handshake, Info, Cancel
  1. Maybe some (or all) are not suitable to serve as frame type

Looking at vert.x messaging they only have 4 types:

  • send to send a message to an address
  • publish to publish a message to an address
  • register to subscribe to the messages sent or published to an address
  • unregister to unsubscribe to the messages sent or published to an address

Looking at this, vert.x slices it completely different than current hypercore-protocol I need more time studying Dat inner-workings to say anything sensible here, your feedback can help!

First thoughts:

  • having an address at frame level like vert.x may obviate the need for Handshake
  • an address need not be directional (dat-url), it can be a topic to which you can pub / sub
  • handshake information can be placed in any / every Frame by means of the Header
  • self-contained frames make protocol more robust, e.g. in handing broken pipes, network issues

[TODO What would be missing if adopting the vert.x way with only the 4 frame types?]

Impact:

  • same as option 1, except additional effort downstream for each frame type removed / abstracted away

Pro's / Con's:

  • same as option 1
  • simplified protocol, more flexibility

Add a Message to the mix

In this option the schema stays as it is now, with the only additional a Message message type. The message would have a Header and Body and some other fields, just like Frame

Pro's / Con's / Impact:

  • similar to option 1
  • more moving parts, less consistency in protocol
  • potentially more handling downstream, less interoperability

3 - Layered on top of hypercore

Currently this option is favoured by both @mafintosh and @joehand But to me this seems to be the approach with most downsides

In this setup:

  • both hypercore-protocol and hypercore remain untouched
  • hypercore-messaging satellite module provides message creation + handling logic
    • module design virtually identical to the one described in option 1
  • hypercore-messaging is incorporated by downstream modules

Pro's:

  • freedom, use messaging or do not
  • (I can't think of more pro's)

Cons:

  • messaging is not central concept of Dat, but an addon
  • (some) handling logic must be duplicated in all downstream modules
  • easier to make mistakes, incorporate messaging incorrectly
  • freedom leads to fragmentation, incompatible application designs
  • requires more effort, support to guide and steer the community
  • adoption of messaging layer in the ecosystem may be much slower

--

Previous part: Design of message-based abstraction layer on top of hypercore

Next part: Optimizing traction and exposure

aschrijver avatar Jul 20 '17 20:07 aschrijver