partisan icon indicating copy to clipboard operation
partisan copied to clipboard

partisan instead of teleport?

Open tsloughter opened this issue 7 years ago • 20 comments

As I was once again looking to revive https://github.com/vagabond/teleport I realized partisan might actually be meant for this itself. Description of teleport https://vagabond.github.io/programming/2015/03/31/chasing-distributed-erlang

Basically, is partisan meant to be used for message passing between nodes in place of distributed Erlang? For some reason I had it in my head it was for fault detection and gossip only, not for my applications processes to be sending direct messages between each other.

tsloughter avatar Apr 12 '17 00:04 tsloughter

Lasp uses Partisan for all internode communication. This is the direction we are continuing to both research and develop, so yes, I consider Partisan an alternative to teleport.

While I consider teleport an interesting engineering solution to the problem of getting around distributed Erlang; (and, I had my own version of it called ringleader and Andrew's teleport, both motivated by problems internally discovered when building Riak MDC replication) I do not believe teleport is sufficient alone for building large-scale applications on Erlang with various topologies: I think the solution teleport provides is only ideal for adapting existing Erlang applications to avoid head-of-line blocking and flow control issues with distributed Erlang.

That said, Partisan is aimed at becoming a middleware solution for supporting large-scale Erlang applications: fault detection, reliable membership, and reliable broadcast and unicast.

cmeiklejohn avatar Apr 13 '17 12:04 cmeiklejohn

For what it's worth, the Lasp fork of Plumtree uses partisan for membership and all of the underlying transport communication between members of the broadcast tree: this is the primary difference from the Basho / Helium maintained fork.

cmeiklejohn avatar Apr 13 '17 12:04 cmeiklejohn

Further comments: (tagging @Vagabond)

  • partisan serializes messages using term_to_binary and then passes them onto gen_tcp:send. It wasn't completely clear from Andrew's article whether this is a bottleneck or not. If not, we should adapt partisan to use the BIF he had?

  • What is the best encoding format to use between nodes? (I assume something where there's a BIF available that's better / worse than term_to_binary / binary_to_term. )

  • Lazy connection management doesn't work for us: it assumes you know membership ahead of time and you connect when you want to send messages. Partisan's failure detection mechanism is performed using connection maintenance.

  • Partisan can trivially be extended for multiple connections per host, but we're unclear what this means for failure detection: this is why we haven't done it yet.

  • Parse transform is neat, and we'd love to take that.

I'm open to hear more feedback.

cmeiklejohn avatar Apr 13 '17 16:04 cmeiklejohn

Generally speaking, I think it would be valuable to identify the useful contributions in teleport and migrate them over, given we've done significant stress testing of this library, and that this library provides higher level services than just connection maintenance.

cmeiklejohn avatar Apr 13 '17 16:04 cmeiklejohn

Regarding term_to_binary, I think he ended up finding that term_to_iolist he wrote works better than anything else he tried.

tsloughter avatar Apr 13 '17 17:04 tsloughter

And yea, multiple connections between hosts is a must. Not sure if it could be done separately from the failure detection connection easily? Then those could also be lazy if the user wanted.

tsloughter avatar Apr 13 '17 17:04 tsloughter

If multiple connections are a must, then are we going to try to preserve FIFO ordering between messages or not?

cmeiklejohn avatar Apr 13 '17 17:04 cmeiklejohn

I would assume not. Unless there was a low overhead way to do it for only messages to the same process. Maybe even just mapping the tuple, {To,From} to the same connection from the pool, shrug, that is likely to cause problems.

tsloughter avatar Apr 13 '17 17:04 tsloughter

I think I sharded packets across the connection pool from node-to-node based using the source-destination tuple as the shard key, so process-process messaging order was preserved. Total messaging order is obviously an entirely other kettle of fish. I also punted on reply delivery.

Some of the other points, distributed erlang cheats a lot to be fast, Ideally you'd have something like partisan for 'regular' disterl, and maybe track communication networks between processes, and in the case of a particularly hot path between 2 processes, spawn up a dedicated connection between them instead of using the global disterl port? You could use process_flags or something like that to indicate if a process was allowed to opt in to such a network optimization.

I would argue that message ordering beyond the ordering between 2 processes is not strongly relied upon in most Erlang code and can be sacrificed to gain network performance in many cases.IIRC the message ordering in distributed erlang isn't that strongly guaranteed anyway.

Vagabond avatar Apr 13 '17 18:04 Vagabond

My read, taking Christopher's points in order:

  • it sounds like it was a bottleneck, hence the teleport:term_to_iolist code to lessen GC pressure.
  • IMO binary_to_term is well tested, fast, and doesn't collapse the scheduler much anymore.
  • I don't think that this is critical in any way to the functioning of the library, we should be able to port stuff over without much of an issue.
  • I imagine that we'd need it for our use-case; I can think of several things that it'd make simpler. I think your options there boil down to the equivalent of one_for_one (i.e. if any connection is still a live, the remote node is still connected) or one_for_all (i.e. if any connection dies, all are terminated and the node is considered disconnected).
  • Andrew says that while he mentioned one in the blog post, the transform never got written, but that it'd be easy enough to do.

I think that pulling stuff over into partisan (or adding teleport as a dep and just using the things that we need) is reasonable, it's just a matter of what order to do things in, I think. I think the first thing I'd do is work on a design for multiconnection (since that's the biggest change semantically, I think), then work on a parse transform, then pull over the performance stuff as an optimization if needed. Does this seem reasonable?

Misc:

  • Benoit has a branch here with some improvement and recent use: https://gitlab.com/benoitc/teleport/

evanmcc avatar Apr 13 '17 18:04 evanmcc

Correction: it looks like Benoit's code is more 'inspired by' than 'based on'.

evanmcc avatar Apr 13 '17 18:04 evanmcc

I think I sharded packets across the connection pool from node-to-node based using the source-destination tuple as the shard key, so process-process messaging order was preserved. Total messaging order is obviously an entirely other kettle of fish. I also punted on reply delivery.

Is this guaranteed across disconnects/reconnects, though? A simple hashing function (even chash in some cases given particular join behavior) doesn't appear to be enough to capture this concern.

cmeiklejohn avatar Apr 13 '17 18:04 cmeiklejohn

Some of the other points, distributed erlang cheats a lot to be fast, Ideally you'd have something like partisan for 'regular' disterl, and maybe track communication networks between processes, and in the case of a particularly hot path between 2 processes, spawn up a dedicated connection between them instead of using the global disterl port? You could use process_flags or something like that to indicate if a process was allowed to opt in to such a network optimization.

We originally had the default connection manager do full mesh with disterl, but then I think it was eventually replaced with TCP. We could bring back that, though.

cmeiklejohn avatar Apr 13 '17 18:04 cmeiklejohn

I would argue that message ordering beyond the ordering between 2 processes is not strongly relied upon in most Erlang code and can be sacrificed to gain network performance in many cases.IIRC the message ordering in distributed erlang isn't that strongly guaranteed anyway.

Agreed; we want to have and support applications where FIFO is not required. However, this highlights a particularly interesting point, that existing applications that rely on this behavior (such as, many internal Ericsson modules) would not be able to take advantage of our work.

cmeiklejohn avatar Apr 13 '17 18:04 cmeiklejohn

I think that pulling stuff over into partisan (or adding teleport as a dep and just using the things that we need) is reasonable, it's just a matter of what order to do things in, I think. I think the first thing I'd do is work on a design for multiconnection (since that's the biggest change semantically, I think), then work on a parse transform, then pull over the performance stuff as an optimization if needed. Does this seem reasonable?

Sounds great to me.

I'd say that first we want to focus on getting SSL/TLS working, and then go multi-connection, if that makes sense with everyone else.

cmeiklejohn avatar Apr 13 '17 18:04 cmeiklejohn

Starting to actually work on this for vonnegut, I noticed that there are few things that we need that partisan doesn't currently provide:

  1. remote calls and casts (and various other gen_* messages)
  2. remote monitoring for the above
  3. via support for the above.

basic versions of 1 and 3 are probably pretty easy to add, but 2 is a fair bit of complexity add. Is this something that you're interested in, or had you only envisioned partisan working with message oriented protocols?

evanmcc avatar Apr 19 '17 23:04 evanmcc

We're interested in all three of the proposed things and are willing to provide support where possible. Maybe we can open an issue for each of these items and put together a plan for integrating them. Would that be an acceptable approach?

cmeiklejohn avatar Apr 21 '17 23:04 cmeiklejohn

cool.

since the existing code uses the ServerSpec unchanged, I am not sure that 3 is actually needed.

I have unidirectional versions of send, cast, and call working (up from cast). I'll open issues for the other two, and try to see whether 3 is actually needed before opening one there.

evanmcc avatar Apr 21 '17 23:04 evanmcc

since the existing code uses the ServerSpec unchanged, I am not sure that 3 is actually needed.

What do you mean?

cmeiklejohn avatar Apr 22 '17 00:04 cmeiklejohn

Via support is just a special tuple: {via, MyReg, [some, args]} so as long as that is faithfully serialized and deserialized, it should just work.

see the definition of ServerRef here: http://erlang.org/doc/man/gen_server.html#call-3

evanmcc avatar Apr 22 '17 15:04 evanmcc