Distributed.jl Define and implement a routable, extensible Julia "message"

Starting a discussion to cleanly specify a Julia message. This will help in

swappable transports at a "message" level. For example, 0MQ or MPI. See https://github.com/JuliaParallel/MPI.jl/issues/60
messages that can be routed via intermediate nodes - important when we support other topologies.
implementation of reliable/guaranteed message delivery system
user defined messages - the subsystem should be able to transport and hand-off user defined message for processing by user code.
recovery from errors while deserializing messages. Well defined message boundaries should help.

A Julia message header could include

version
from_pid
to_pid
length in bytes
message_name - symbol used to identify the handler

Jul 07 '15 03:07 amitmurthy

Can you help me understand the motivation here - how does a 'message' differ from just sending a normal Julia type via RemoteRef?

Jul 07 '15 13:07 malmaud

This is more about clean separation of layers at a lower level in the stack. Nothing changes at the user level, though we may choose to expose the message layer to the user too. My visualization of the stack looks somewhat like this:

 ---------------------------------------------------
|  Topology aware smart iterators, workflow DAGs    | - 5
 ---------------------------------------------------
|   Higher level API - @spawn, @parallel, pmap etc  | - 4
 ---------------------------------------------------
| Remote function call API | Data transfer API      | - 3
 ---------------------------------------------------
|                   Message API                     | - 2
----------------------------------------------------
| Byte streams (TCP/IP) |  MPI  |  0MQ  | Others    | - 1
 ---------------------------------------------------

Starting from the top:

5 is what folks in @alanedelman 's research group that I spoken with have stated as the goal at one time or another.

4 is what folks typically use to get parallelism in their code.

In 3, the API we have is one of remote function execution via the remotecall* functions, and data transfer via put!, take, etc., which all internally use remotecall today. I think there is scope to optimize the data transfer APIs to set/get data directly, bypassing the need for closure serialization / function calling as currently done.

Currently 1 & 2 are sort of combined and not user accessible. We should expose an an API along the lines of

send(pid::Int, msg_type::Symbol, msg::Array{UInt8, 1})

to just transport the block of bytes, msg to worker pid where it handled depending on msg_type. Handlers could even be user defined. Exposing something at this level just provides users with greater flexibility if desired - most folks don't need this, but for those who do, it is possible. Also, becomes easier to support multiple transports for the transfer of msg - we don't care how it is transferred to worker pid - could be via MPI or 0MQ or the TCP transport in Base.

Jul 07 '15 15:07 amitmurthy

+:100: – I really like this vision. I'm also not certain that our current RemoteRef model is right for 3.

Jul 11 '15 13:07 StefanKarpinski

This is perfect. I am using this as the Julia parallel roadmap when anyone asks.

Jul 17 '15 03:07 ViralBShah

Not really relevant to Base anymore, since this would be Distributed.jl related, or other parallelism/messaging packages

Feb 10 '24 22:02 vtjnash