automerge-classic icon indicating copy to clipboard operation
automerge-classic copied to clipboard

Elixir port

Open alizain opened this issue 6 years ago • 29 comments

Hey!

I’d like to write a port of this library in Elixir to use when syncing between server <-> server and server <-> client, where the servers are running Elixir or some other language, and the client is running automerge.

I think there’s a huge need for a generalized document synchronization protocol that isn’t language specific, and doesn’t require paying for a service (a la Firebase or Realm)

Any thoughts on documenting or formalizing the delta spec or core merge logic? It’s different from the JSON CRDT paper, yes?

I’d be happy to help and write tests that verify cross-implementation correctness.

Ideally, this implementation, implementations in other languages, and the spec would be maintained together.

alizain avatar Apr 26 '18 03:04 alizain

I think there's a great opportunity in providing that kind of cross-language capability but at the moment Automerge is quite deliberately agnostic about networks and protocols.

I'm aware of two radically different networking stacks built around automerge (having worked on both): mpl and hypermerge.

MPL is based on WebRTC and implements a vector-clock exchange based synchronization system with JSON message passing. It is close(ish) to working in a browser, but relies on "signaling servers" per-application to discover peers. The Trellis project integrated that capability into an electron app host with a tiny in-app webserver. Not the most elegant solution.

Hypermerge is built on hypercore/hyperdiscovery, which is the infrastructure behind the dat project. It is built around the notion of append-only logs that can be found by peers by looking up their signing key in a DHT, or by local or internet DNS service discovery. Hypermerge was used by our recent Pixelpusher project and reduced the user experience complexity of port-forwarding and so on, but uses a number of libraries that rely on nodejs capability and is farther from browser support.

Both of these have certain advantages and disadvantages, and neither have particularly efficient network representations or much regard for security.

There is at least one other project I've heard about using automerge as a synchronization system for a client/server model. This is something I think is tremendously promising but has very much different needs from a pure peer-to-peer system.

I suggest that before worrying about cross-language communication one should probably solve the problem of communication within a single application or system and then port that effort to multiple environments.

In other words, go build something good and then we'll all enthusiastically adopt and support it as a layer above automerge, but I think automerge itself should remain focused on being a great, non-opinionated CRDT that doesn't impose too many decisions or dependencies on its consumers.

PS: An elixir port would be very welcome too...

pvh avatar Apr 26 '18 03:04 pvh

Whats the project that uses automerge for syny between client and server. Sounds interesting to learn from.

On Thu, 26 Apr 2018 at 05:42 Peter van Hardenberg [email protected] wrote:

I think there's a great opportunity in providing that kind of cross-language capability but at the moment Automerge is quite deliberately agnostic about networks and protocols.

I'm aware of two radically different networking stacks built around automerge (having worked on both): mpl and hypermerge.

MPL is based on WebRTC and implements a vector-clock exchange based synchronization system with JSON message passing. It is close(ish) to working in a browser, but relies on "signaling servers" per-application to discover peers. The Trellis project integrated that capability into an electron app host with a tiny in-app webserver. Not the most elegant solution.

Hypermerge is built on hypercore/hyperdiscovery, which is the infrastructure behind the dat project. It is built around the notion of append-only logs that can be found by peers by looking up their signing key in a DHT, or by local or internet DNS service discovery. Hypermerge was used by our recent Pixelpusher project and reduced the user experience complexity of port-forwarding and so on, but uses a number of libraries that rely on nodejs capability and is farther from browser support.

Both of these have certain advantages and disadvantages, and neither have particularly efficient network representations or much regard for security.

There is at least one other project I've heard about using automerge as a synchronization system for a client/server model. This is something I think is tremendously promising but has very much different needs from a pure peer-to-peer system.

I suggest that before worrying about cross-language communication one should probably solve the problem of communication within a single application or system and then port that effort to multiple environments.

In other words, go build something good and then we'll all enthusiastically adopt and support it as a layer above automerge, but I think automerge itself should remain focused on being a great, non-opinionated CRDT that doesn't impose too many decisions or dependencies on its consumers.

PS: An elixir port would be very welcome too...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/automerge/automerge/issues/88#issuecomment-384503508, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuCwpTYQRalRPuM3BCY9bvuzYYAj4Njks5tsUIogaJpZM4TkdXS .

ghost avatar Apr 26 '18 11:04 ghost

The actual network implementation is less of a concern at the moment, as your rightly pointed out. I would imagine websockets or even straight-up HTTP long-polling would serve well enough as a strawman implementation for our client <-> server use-case.

My current concern is ensuring that the changes made to a document in Elixir, when synced to the client, deerministically yields the same document in JavaScript. IE the merging logic between the Elixir library and automerge are the same. And that the JSON format describing the change is consistent between the Elixir library and automerge.

Any thoughts on what can be done to facilitate/ensure this? Or even general thoughts/advice before I embark on this port?

Ps. Thank you for the tour of libraries built on top of automerge. Hypercore looks especially very interesting!

alizain avatar Apr 26 '18 11:04 alizain

@gedw99 the use case is to use automerge as an alternative to traditional CRUD/REST web apps.

State is constructed and enriched on the server, and elixirmerge <-> automerge would take care of syncing changes to the client in real-time (with the help of websockets).

alizain avatar Apr 26 '18 11:04 alizain

Hi @alizain, I agree that it would be great to have several implementations in different languages that can produce/consume the same JSON format and thus synchronise their changes.

However, I am a little hesitant because I am planning various changes to Automerge's data format in order to improve the storage and network efficiency. Thus, even if you implement the data format used by Automerge today, it will probably be a moving target. Also, we have not yet given any thought to versioning of the protocol, which will be important if several different implementations, potentially with different features, need to talk to each other. In the future I hope to have a well-designed data format with a detailed spec that can be implemented in various languages, but at the moment we're not there yet.

If you're still keen to get started, you can of course implement the data format used by Automerge today, and change it in the future as we improve it. The key concepts are explained in the file INTERNALS.md in the repository. The description is incomplete as I haven't yet had the time to write a full spec, but it can get you started. The code that interprets the operations is mostly in op_set.js (unfortunately also mostly undocumented).

ept avatar Apr 26 '18 12:04 ept

ok thanks for the explanation. make sense.

On Thu, 26 Apr 2018 at 14:36 Martin Kleppmann [email protected] wrote:

Hi @alizain https://github.com/alizain, I agree that it would be great to have several implementations in different languages that can produce/consume the same JSON format and thus synchronise their changes.

However, I am a little hesitant because I am planning various changes to Automerge's data format in order to improve the storage and network efficiency. Thus, even if you implement the data format used by Automerge today, it will probably be a moving target. Also, we have not yet given any thought to versioning of the protocol, which will be important if several different implementations, potentially with different features, need to talk to each other. In the future I hope to have a well-designed data format with a detailed spec that can be implemented in various languages, but at the moment we're not there yet.

If you're still keen to get started, you can of course implement the data format used by Automerge today, and change it in the future as we improve it. The key concepts are explained in the file INTERNALS.md https://github.com/automerge/automerge/blob/master/INTERNALS.md in the repository. The description is incomplete as I haven't yet had the time to write a full spec, but it can get you started. The code that interprets the operations is mostly in op_set.js https://github.com/automerge/automerge/blob/master/src/op_set.js (unfortunately also mostly undocumented).

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/automerge/automerge/issues/88#issuecomment-384625235, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuCwn2JrEPgsysUwRhDpauWZ7NhVaXDks5tsb9WgaJpZM4TkdXS .

ghost avatar Apr 26 '18 15:04 ghost

I understand your usecase better now, @alizain. I think the design of paired automerges for client synchronization is a really excellent one and was one of the motivations behind getting involved with this research for me personally, though I haven't actually worked on such a project yet.

I must gently disagree with Martin's concern about changing the data format. While you may need to update the Elixir implementation in the future when such changes arise, that is an eternal and inevitable consequence of having multiple implementations of any library and as long as we continue to keep automerge itself as svelte and dependency-free as it presently is I suspect this should be a manageable problem.

pvh avatar Apr 26 '18 15:04 pvh

@alizain any progress here? If you're working on this I'll leave the issue open, otherwise I'm just going to clean it up to keep the issue tracker useful.

pvh avatar May 09 '18 21:05 pvh

@pvh please see #81 I wanted to create an implementation in Go and C#.

SCKelemen avatar May 25 '18 18:05 SCKelemen

@SCKelemen +1 I wanted to make a golang implementation. It can then be WASMed, and so then all JS browsers and servers can use it. A great way to not need a language port :)

Recently i changed form JSON to using FlatBuffers. I think that automerge could get an order of magnitude speed increase by changing to Flatbuffers do its IO heaviness and seeking. Seems like a good match to me. Currently i use FlatBuffers with golang. I used to use protobufs. Am happy withthe move.

ghost avatar May 25 '18 19:05 ghost

@SCKelemen while I think Martin has recently published a paper on some of this work, the definitive description of automerge will be the JS implementation for the foreseeable future.

that said, automerge is very functional and has a very straightforward interface, so perhaps there's some way to take advantage of the existing JS test suite.

pvh avatar May 25 '18 21:05 pvh

@alizain Do you want to jointly create a spec, and write a few implementations for it? I can handle the Go, C# side if you want to do Elixir?

SCKelemen avatar Jun 26 '18 13:06 SCKelemen

@alizain @gedw99

SCKelemen avatar Sep 25 '18 04:09 SCKelemen

@SCKelemen I'd be interested in helping w/ a C# implementation

ncthbrt avatar Oct 22 '18 08:10 ncthbrt

@SCKelemen I'd be interested in helping w/ a C# implementation

I'm working on one now. I'll comment back with a repo when it's in some semifunctional state.

SCKelemen avatar Oct 24 '18 00:10 SCKelemen

I'm interested in creating a Java implementation. Any progress here? Or roadblocks discovered?

samw3 avatar May 16 '19 01:05 samw3

Hi, folks. I'm interested in creating a python impl, and/or maybe a Rust impl that would give us C-callable and cross-platform. Still early in my interest, though--I haven't sat down to make a serious study of what it would take.

dhh1128 avatar Jun 14 '19 20:06 dhh1128

I wish TypeScript "port" exists (tongue in cheek).

steida avatar Jul 31 '19 23:07 steida

Hi, folks. I'm interested in creating a python impl, and/or maybe a Rust impl that would give us C-callable and cross-platform. Still early in my interest, though--I haven't sat down to make a serious study of what it would take.

There isn't a specification or anything, so any implementation is unlikely to play nicely with the others.

SCKelemen avatar Aug 01 '19 01:08 SCKelemen

I wish TypeScript "port" exists (tongue in cheek).

#155 @steida

j-f1 avatar Aug 02 '19 17:08 j-f1

Did anyone ever start a Go port?

If not, I may start one. We need a CRDT-backed EventCodec implementation in go-theads. Currently, we just have a JSON Patch codec, which has its limitations.

sanderpick avatar Feb 19 '20 06:02 sanderpick

@sanderpick There is https://github.com/gpestana/rdoc that seems to refer @ept's paper although worth considering that (quote from readme)

It is based on academic research on JSON CRDTs, but the details of the algorithm in Automerge are different from the JSON CRDT paper, and we are planning to publish more detail about it in the future.

Gozala avatar Feb 20 '20 10:02 Gozala

Cool, thanks @Gozala. I'll ping Gonçalo.

sanderpick avatar Feb 20 '20 23:02 sanderpick

I'd be really more interested in writing a joint spec with people from other languages, and implement that spec on multiple platforms, so we can ensure interoperability.

SCKelemen avatar Feb 21 '20 15:02 SCKelemen

I've been working on a F# implementation but keep tripping over differences between the actual JS implementation and the internals.md documentation. Has the JS implementation settled down enough for it to be worthwhile to update the documentation?

sdedalus avatar Apr 10 '21 23:04 sdedalus

@sdedalus We're hoping to merge the performance branch into main and make a 1.0 preview release in the next week or two. That will include the last remaining breaking changes, and after that point we're hoping to keep the data formats and APIs stable. So I'd say wait just another short while.

ept avatar Apr 12 '21 15:04 ept

@ept Since there are a couple preview releases out, do you think it's (mostly) safe to consider the data formats and APIs stable at this point?

connorjacobsen avatar May 29 '21 05:05 connorjacobsen

It's unlikely there will be any major changes. There might still be a bit of work around clarifying numeric types (I'm not sure if that has landed) but it's certainly very close at this point.

pvh avatar May 29 '21 07:05 pvh

After the first release to npm and a bit of polishing, I'd really like to see some spec too, for people to join in and port it to other languages. An ability to write collaborative apps which would work not only in a JavaScript environment but also in native iOS/Android apps, in future would make this library one of a kind. There's a great potential.

1valdis avatar Jun 01 '21 11:06 1valdis