arcon
arcon copied to clipboard
Arcon Actor
Discussion thread regarding dynamic actors that can be spawned within streaming pipeline.
fn handle_element(
&mut self,
element: ArconElement<Self::IN>,
mut ctx: OperatorContext<Self, impl Backend, impl ComponentDefinition>,
) -> OperatorResult<()> {
let id = ctx.spawn_actor(...);
let actor_ref = ctx.get_actor(id);
Ok(())
}
Actual use cases and Requirements
Before implementing anything, we should look at use cases and requirements (stateful/stateless, consistency).
Should the Arcon Actor be a Kompact component or some abstraction around it?
This concerns whether the Actor is just a pure Kompact component or if we build an abstraction around it. Such that users
provide a get/set for their ArconActor and the ComponentDefinition and the Message = Ask<..., ...> logic is implemented through a macro etc...
Ok, I haven't been in any your meetings recently, so I don't know what else you've been looking at, but here are my thoughts on the subject.
I'd be thinking you'd definitely want some kind of abstraction API. You may consider allowing pure Kompact Actors for "power users" or whatever, but in itself, they expose way too much access to your system to be useful to build simple processing.
Off the top of my head, here's stuff I would think should be exposed and what should be provided instead:
Exposed
- Messaging, but only to legal Arcon paths. I don't think exposing arbitrary network messaging is necessary or a good idea in general.
- A state API for both ephemeral and persisted state. Arcon actors should support consistent state snapshots in some way, but they might gain performance by also keeping some ephemeral state around that can easily be recomputed from a snapshot if necessary. Stateless actors on the other hand are basically trivial. There is no reason to even allow definitions of them. You can just use a normal stream processing element with a mailbox and access to an arbitrary set of actor references.
- Blocking/Non-blocking futures. These really ease writing some types of asynchronous code and would add a lot to the Arcon Actor API imo. Especially since I expect these actors to be heavily ask-based, since otherwise people would have likely just written a custom streaming processor.
Provided
- Backpressure (maybe something in the style of Akka Streams?)
- Serialisation. While eager serialisation and lazy deserialisation are important in Kompact for performance, I think at the Arcon level it would do well to be abstracted away. Everything in Arcon uses the same serialisation mechanisms, so we can easily generate the necessary code given the set of messages expected by the Actor.
Overall, I'm thinking, if you end with something powerful enough to do what Ray does (e.g., reinforcement learning, parameter server, ring AllReduce), you probably found a pretty good API that'll suffice for 90% of all use cases without having to steep into Kompact.
Seif brought it up. On the language level we would like to hide the message passing and have concurrent/remote objects. Operators can create these objects and invoke methods on them that return futures. The main reason was to support short running tasks, but also to allow some kind of custom sources. I will try to sketch an example
Sure, calling ask with a message as argument on an opaque reference and getting a future back is pretty much equivalent to a concurrent/remote object invocation.
I still think message-passing > RPC as a style but at this level of abstraction the two are pretty darn similar...
What do you think about exposing or providing lazy and eager futures?
What's your definition of lazy and eager futures?
Eager: Start every computation as soon as all dependencies are available Lazy: Start any computation once the any result is required
Those?
Yes true, does a future get executed even if you never end up awaiting it? I think I need to read through the Kompact tutorial to see how it works 😄
Ah, well...it's a bit so and so.
So Rust futures themselves are lazy, i.e. nothing happens unless you drive them to completion somehow. But Kompact's ask does eagerly send the request before even creating the future, so that part is eager. This may seem like a weird mix when put this way, but it kinda makes sense if you think about it from a latency point of view. You wanna get started on that latency-sensitive operation as soon as possible. There really is nothing to gain from not doing it immediately. We aren't going to optimise anything when holding on to the (hypothetical) "task-graph" any longer. And the ask itself is generally side-effecting, so even if we don't await the future ever, we probably still want the side-effect to happen.
Which, I suppose, gets us closer to the answer to your question: Imo, lazy futures make sense when they allow you to somehow improve the execution based on what happens in between creating them and requiring the result. Optimising a query, dropping not-needed results, etc etc. If that is something you can do with the API you are offering futures with, they are sensible as far as I'm concerned. If you do not benefit in any way (or simply not sufficiently to make it worthwhile) from delaying the start of the execution, then you should start it eagerly as soon as possible to maximise your potential for parallelism and minimise your latency.
I think a good first move might be to try and implement a rough version of one of the use-case algorithms. How difficult do you think it would be for us to implement Allreduce in Kompact?
I think a good first move might be to try and implement a rough version of one of the use-case algorithms.
That is always a good approach in my experience when you are unsure of what you really need.
How difficult do you think it would be for us to implement Allreduce in Kompact?
I mean, I've never implemented it, so I only have a high-level understanding of the algorithm, but I think it should be pretty straight forward. You can use named actor references to deal with the issue of cyclic dependency bootstrapping. The only other challenge I can imagine right now is backpressure, which I recommend you just ignore for now, and if it becomes an issue in the end we can have a chat and decide how we wanna do that.