thin-edge.io icon indicating copy to clipboard operation
thin-edge.io copied to clipboard

POC to evaluate the proposals for a new core design around plugins and actors

Open didier-wenzek opened this issue 3 years ago • 9 comments

There are several work-in-progress proposals to rebuild thin-edge on new foundations using plugins:

The key criteria to commit to one of these proposals, or a combination of them, is to assess that one can smoothly rebuild the current features of thin-edge, progressively moving parts into the new design.

To have concrete criteria to compare the proposals, we will build a POC for each. The point is to demo:

  • How the current thin-edge can be reshaped internally using rust components.
  • How a thin-edge executable can be built as an assemblage of components that have been implemented independently.
  • How are addressed the dependencies between the plugins, their instantiation, configuration and connections.
  • How are addressed the main internal communication patterns:
    • pub/sub for telemetry data,
    • request/response for operations,
    • message gathering from various sources,
    • request dispatch to appropriate actors.
  • How external communications are addressed, notably over MQTT.

The focus is on the internal plugin API:

  • The business logic of the components involved by the POCs is out of scope.
  • The actual actions can be mocked up with print statements.
  • The external component API over MQTT.

Here are the proposed components for the POC (retained because representative of what we have today).

 ┌─────────────┐               ┌──────────────────────────────────────────────┐                 ┌──────────────────────────────────────────────┐
 │             │ MqttMessage   │                                              │    SMRequest    │                                              │
 │ MQTT        ├───────────────► C8Y                                          ├─────────────────►  SM                                          │
 │             │               │                                              │                 │                                              │
 │             ◄───────────────┤                                              ◄─────────────────┤                                              │
 │             │  MqttMessage  │                                              │    SMResponse   │                                              │
 │             │               │                                              │                 │                                              │
 └─────────────┘               └───────▲───────────────────────────────▲──────┘                 └────┬───▲────────────────────────────┬───▲────┘
                                       │                               │                             │   │                            │   │
                                       │                               │                             │   │                            │   │
                           Measurement │                   Measurement │                   SMRequest │   │SMResponse         SMRequest│   │ SMResponse
                                       │                               │                             │   │                            │   │
                               ┌───────┴─────┐                 ┌───────┴─────┐                  ┌────▼───┴────┐                  ┌────▼───┴────┐
                               │             │                 │             │                  │             │                  │             │
                               │ Collectd    │                 │ ThinEdgeJSON│                  │  Apt        │                  │ Apama       │
                               │             │                 │             │                  │             │                  │             │
                               │             │                 │             │                  │             │                  │             │
                               │             │                 │             │                  │             │                  │             │
                               │             │                 │             │                  │             │                  │             │       
                               └───────▲─────┘                 └───────▲─────┘                  └─────────────┘                  └─────────────┘       
                                       │                               │
                                       │ MqttMessage                   │ MqttMessage                              
                                       │                               │
                               ┌───────┴─────┐                 ┌───────┴─────┐
                               │             │                 │             │
                               │ MQTT        │                 │ MQTT        │
                               │             │                 │             │
                               │             │                 │             │
                               │             │                 │             │     
                               │             │                 │             │
                               └─────────────┘                 └─────────────┘

  • The C8Y plugin implements all the Cumulocity specific features.
    • It connects to Cumulocity using an MQTT plugin instance.
    • It gathers measurements from various sources - here a collectd plugin and a ThinEdgeJson plugin.
    • It translates measurements into Cumolocity specific format and send these messages via MQTT.
    • It receives operation requests from Cumulocity and translates these into SMRequest forwarded to the SM plugin
    • It expects operation responses from the SM plugin and forwards the translated responses to Cumulocity.
  • The Collectd plugin ingests data produced by collectd
    • It consumes MQTT messages on the collect topic.
    • It produces measurement messages that can be consumed by any telemetry aware component - here only C8Y.
  • The ThinEdgeJson plugin ingests telemetry data published by external thin-edge MQTT components.
    • It consumes MQTT messages on the tedge topic.
    • It produces measurement messages - consumed here by the C8Y plugin.
  • The SM plugin manages software management requests.
    • It consumes SMRequest messages - independently of their origin.
    • It produces SMResponse messages, taking care to send them to the former requester.
    • It delegates the requests to a set of SM plugins - here the Apt and the Apama plugin.
    • It expects SMResponse messages from the plugins actually processing the requests.
    • The flow of a request depends of the kind of request.
      • An install or remove request is directly to the specific sm plugin named in the request.
      • A request to list the installed packages is sent to all the sm plugins and their responses are combined to form the global response.
  • The Apt and Apama plugins are two components with the same interface to "interact with a package manager" but with different targets.
    • They are free on how they interact with the actual package manager.
    • What matters is that they accept SMRequest returning SMResponse.
  • An instance of the MQTT plugin manages an MQTT connection.

A key goal of the new design is to be able to connect components that have been implemented independently while using statically typed messages. This can be achieve using light dependencies around message type definitions, with a crate per plugin.

  • The low-level MQTT plugin depends on no other plugin and defines the MqttMessage type.
  • The SM plugin depends on no other plugin and defines the SMRequest and SMResponse types.
  • The Apt and Apama plugins depend on the SM plugin using the SMRequest and SMResponse types.
  • The Measurement type must be defined somewhere. I see that as an open question. I propose here to have a plugin_telemetry crate just to define this type.
  • The Collectd and ThinEdgeJson crates depend both on the mqtt and telemetry crates, but nothing else.
  • The C8Y plugin depends only on the telemetry and the sm plugins. Unaware of apt and apama or any software manager that can be implemented independently. Unaware also of any measurements sources.

One of the main benefits of this proposal to move toward plugins is to clarify the dependencies. Here is the nice expected result from this POC.

                 ┌───────────────────────┐     ┌───────────────────────┐    ┌───────────────────────┐
                 │"plugin_c8y"           │     │"plugin_sm"            │    │"plugin_apt"           │
                 │                       ├─────►                       ◄────┤                       │
                 │                       │     │ SMRequest             │    │                       │
                 │                       │     │                       │    │                       │
                 │                       │     │ SMResponse            │    │                       │
                 └───┬───────────────┬───┘     └────────────────▲──────┘    └───────────────────────┘
                     │               │                          │
                     │               │                          │           ┌───────────────────────┐
                     │               │                          │           │"plugin_apama"         │
                     │               │                          └───────────┤                       │
                     │               │                                      │                       │
                     │               │                                      │                       │
                     │               │                                      │                       │
  ┌──────────────────▼────┐     ┌────▼──────────────────┐                   └───────────────────────┘
  │"plugin_mqtt"          │     │"plugin_telemetry"     │
  │                       │     │                       │
  │ MqttMessage           │     │ Measurement           │
  │                       │     │                       │
  │                       │     │                       │
  └───▲────────▲──────────┘     └────────────▲───▲──────┘
      │        │                             │   │
      │   ┌────┼─────────────────────────────┘   │                         ▼
      │   │    └─────────────────────┐           │
      │   │                          │           │
  ┌───┴───┴───────────────┐     ┌────┴───────────┴──────┐
  │"plugin_thinedge_json" │     │"plugin_collectd"      │
  │                       │     │                       │
  │                       │     │                       │
  │                       │     │                       │
  │                       │     │                       │ 
  └───────────────────────┘     └───────────────────────┘

didier-wenzek avatar May 13 '22 17:05 didier-wenzek

Hi!

Our POC implementation is ready.

This is the POC based on the interfaces introduced via #979 plus the "core" implementation and related parts (not yet in a PR).

Contents

Here is the tip of the branch that contains everything.

  • Several Plugins
    • c8y
    • collectd
    • mqtt
    • "SM" via apt
    • "thin_edge_json" Plugins
  • Independent instantiation and lifecycle of all Plugins
  • Generic wireing, but typesafe messaging between the Plugins
  • Configurable wireing via configuration

The above plugins are mockups - except the mqtt one and I believe the thin_edge_json one is already final as well.

Testing

To test the PR:

git fetch https://github.com/matthiasbeyer/thin-edge.io/ feature/add_tedge_api/showcase
git checkout FETCH_HEAD
cargo build -p tedge-cli --features sm,mqtt,c8y,collectd,thin_edge_json
./target/debug/tedge-cli -l info run ./tedge/example-config-showcase.toml

You can change that "info" in the last line to "debug" to see more output.

You can stop the process using Ctrl-C.

Please note that if you don't have an MQTT broker running, the application will not start. The current behaviour is that all plugins need to initialize succesfully, which the MQTT plugin will not do without a broker. Pressing Ctrl-C will cancel this process and you will see an error because the MQTT plugin was not able to connect to an MQTT broker.

If you have your MQTT broker running, you can now

  • Connect to it on localhost:1883
  • Send the following JSON payload to it: {"type":"List"} to see a SM request to list installed software to be triggered

(Of course the JSON format here is just a mockup and nothing final)

Walkthrough

Here go some points to describe the fulfilled requirements:

  • How the thin-edge.io executable is built from an assemblage of components that have been implemented independently
    • If you look at the history of the showcase branch, you'll see that most of the plugins here were implemented in parallel (use: git log --graph --oneline 5e358be9aeebd6ffa23dbb2f782049906880a231..FETCH_HEAD to get a nice graph)
    • If you look at the implementation of the main() function, you'll see that all PluginBuilders are registered at the application. There is no connection setup done between them at this point!
    • The configuration file defines
      • Which plugins should be instantiated
      • Which plugin sends messages to which other plugin
      • The configuration of the individual plugins
  • The Plugin lifecycle is as follows:
    • instantiation with user-defined configuration
    • "start"ing: Giving the Plugin the opportunity to do setup steps
    • "handle"ing messages: For each incoming message, the respective handle is called
    • "shutdown" of the plugins: Giving the plugins the opportunity to clean up resources
  • external communication via mqtt
    • Is implemented in the mqtt plugin, which does not do any parsing but simply defines the message types that can be used to interface with the MQTT broker
  • Internal communication patterns
    • Currently implemented is a request/response pattern, in a similar fashion a pub/sub model could be implemented.

You are also welcome to have a look at the individual plugin implementations, although they are of course mostly mockups:

Currently not in the showcase

The following is currently not implemented in the showcase, mostly because it is not really interesting for showing the overall scheme:

  • Bridging of different JSON payloads to other plugins, either via thin_egde_json or via other JSON payload formats
  • Any other plugin that was already implemented but is not relevant for this showcase is used (no "sysinfo", no "filter", no "inotify", etc...)
  • A second "sm" plugin besides "plugin_sm_apt", because it would look exactly the same (as it is all just a mock) except for its namel
  • Responses from the software-management plugin ("plugin_sm_apt") upon software management requests

The grand scheme

Of course the diff you're looking at (5e358be9aeebd6ffa23dbb2f782049906880a231..FETCH_HEAD) is only the part implementing the showcase. The whole core implementation is a bit more involved and has now been ongoing for about three months (because #979 is a requirement).

The core implementation is not yet in a PR. This PR will of course feature an in-depth explanation on how things work and how things were implemented once it is opened!

The core implementation PR will then of course not contain stuff from this showcase!

matthiasbeyer avatar May 18 '22 08:05 matthiasbeyer

Our POC implementation is ready.

Thank you.

Testing

Things work as described.

Walkthrough

How a thin-edge executable can be built as an assemblage of components that have been implemented independently.

  • Definitely a good fit here.
    • The main.rs is a long sequence of plugin registrations, each plugin being provided by external crates.
    • All the plugins are activated or not depending on the runtime config. This opens the path for a battery-included tedge executable.
    • The main.rs and Cargo.toml can be easily updated to include at compile-time only a sub-set of the plugins. This opens to path to executables optimized for a use-case.
  • Questions.
    • Can ^C handling be moved inside the core or in a plugin?
    • For me the target is being able to build an application as an assemblage of plugins without deep Rust expertise. This expertise should be moved into the framework and the plugins.

How are addressed the dependencies between the plugins, their instantiation, configuration and connections.

  • I have a mix feeling on the configuration file.
    • Pros:
      • all the plugins in one-file
      • a systematic approach
      • a nice doc subcommand that display what is expected for a plugin.
    • Cons:
      • Not so easy to make the relationship with the main.rs.
      • Why some name are given as string as “collectd” other as TOML identifier as in plugins.collector ?
      • It’s not obvious to see the graph of plugins.
  • I wonder if it makes sense to define the plugin connections dynamically as most of them makes are somehow imposed by the plugin types.
  • Unrelated to the POC.
    • I would prefer to have two configuration files to separate what is related to running context (ip addresses, credentials, …) from what defines the application (plugin graph).

How are addressed the main internal communication patterns.

  • This the point I have the more concerns with.
  • The picture is really not easy to grasp. What kind of communication can be implemented? How? What are the limitations?
  • Up to recently, the message types were bound to a request type. This seems to be no more the case. Is it?

How external communications are addressed, notably over MQTT.

  • It's working. However, this also stresses somehow that writing a plugin is not easy.
  • Why paho_mqtt while there is already an mqtt_channel crate? If not usable it would be good to know why.

Currently not in the showcase

I agree that it makes no sense to have full-fledge features in the showcase. Except for these points related to plugin inter-communication:

  • It matters to have two sm plugin instances (it can be from the fake plugin). The point is to demonstrate message dispatch.
  • It matters to handle the responses of the software-management plugin. Even if these are just ping/pong response. I'm a bit surprise that you did nothing here because everything seems to be in place with ReplySenderFor.

The grand scheme

The core implementation is not yet in a PR. This PR will of course feature an in-depth explanation on how things work and how things were implemented once it is opened!

We have first to agree on a solution.

didier-wenzek avatar May 19 '22 17:05 didier-wenzek

Thank you for the interesting feedback! Could you maybe expand a bit more on these aspects?

However, this also stresses somehow that writing a plugin is not easy.

I wonder if it makes sense to define the plugin connections dynamically as most of them makes are somehow imposed by the plugin types.

TheNeikos avatar May 20 '22 06:05 TheNeikos

Could you maybe expand a bit more on these aspects?

However, this also stresses somehow that writing a plugin is not easy.

I agree that being easy is subjective. What I'm missing is a mental model / a pattern / a system way to understand the design of the plugins. I don't say there is no such pattern but that I don't see it. Looking the code of the different plugins, I can roughly understand each of them works, but I would have a hard time to fix something. Some ramp up time would help for sure.

I wonder if it makes sense to define the plugin connections dynamically as most of them makes are somehow imposed by the plugin types.

For instance, the c8y plugin expects a connection to an mqtt plugin instance to connect the cloud and a connection to the software management plugin to process software updates. Without these connections the c8y plugin is useless. It will even be broken if connected to plugins of the wrong type. Some plugins can have less strict connections. For instance a logger plugin could consume and log all the messages published by others. With such loose constraints, a dynamic connection might make sense. But when then are type & semantics expectations between peers, the wiring can be dynamic but if controlled by the program not the config.

didier-wenzek avatar May 20 '22 20:05 didier-wenzek

Here is a proposal using actors but not actix.

Content

This proposal of a tedge_actors crate is addressing complementary aspects compared to the tedge_api.

  • There is no attempt to address the configuration. I see no reason preventing tedge_actors to be combined with a config along the lines of the tedge_api crate.
  • The actors and their connections are statically created. Can this be done dynamically as for tedge_api?
  • The runtime is reduced to its bare minimum. Compared to what is proposed by the tedge_api, are missing many key features to stop the actors, catch panicking actors ...

The focus is on the definition of actors.

  • The Actor trait enforce a systematic pattern: a single Input type, a single Output type, a single input queue of messages, a single Recipient for the output.
    • Complex input & output types are implemented using enum types. Macros would help to manage sub message types and handlers.
    • Complex connections between actors are implemented by variants of the Recipient trait. One key missing point on work-in-progress for the POC is to provide examples of such connectors.
  • Two traits, the Reactor and the Producer, are used to define the main activities of an actor:
    • Producing spontaneous messages,
    • Reacting to received messages,
  • The implementation of an Actor is unware of all the surrounding parts and mechanisms.
    • A mailbox is provided along the actor state when packed into an ActorInstance.
    • Once created, the ActorInstances can be connected at will using their mailbox addresses.
    • These ActorInstances are inactive - i.e with no background tasks.
    • Once properly connected the ActorInstances are started returning ActiveActor handlers.
    • The ActiveActor handlers gives the control to the running actors (stop, wait for termination, ...).

Testing

git checkout -b didier-wenzek-rfc/tedge_api main
git pull https://github.com/didier-wenzek/thin-edge.io.git rfc/tedge_api
cargo run -p tedge_poc

Measurements can then be sent over MQTT:

tedge mqtt pub 'tedge/measurements' '{"temperature": 12.0 }'

and the results observed over MQTT:

tedge mqtt sub '#'

POC Status

Next Steps

  • Address the key missing point of the tedge_actors POC: request/response as well as 1-n and n-1 connections
  • What are the Pros and Cons of tedge_api and tedge_actors ?
  • How can each proposal be improved with idea of the other?

didier-wenzek avatar May 20 '22 21:05 didier-wenzek

Could you maybe expand a bit more on these aspects?

However, this also stresses somehow that writing a plugin is not easy.

I'm not happy with my first response. Sure, being easy or not is subjective. But, I should come with more concrete feedback.

I see 3 layers in the design of an API for plugin/component/actor.

  1. Assembling components into an executable should be a straightforward task - selecting plugins, creating and connecting static structs/objects/values, possibly with some glue code as Into or From translators from one type of messages to another.
  2. Implementing a component might be more involved but should ideally focus on the feature logic. One of the goals of the plugin API is to ease the interaction of independent streams of events and requests acting on some state. Hence, interaction concerns (e.g select!) should be addressed by the runtime, not the plugins. Similarly, state ownership and immutability should be addressed by the runtime. I acknowledge that a plugin that has to handle an external event source (say a TCP connection) might have to manage internal mutability and interactions between this external source and the requests received from the API. I tried to address this in the tedge_actor proposal with two different traits for the two major behaviors of an actor (reacting to messages or producing spontaneous messages) - but this proposal is not battle-tested yet.
  3. Complexity needs to be somewhere. It's okay to have the runtime overly complex if this helps to remove complexity from the other layers. The runtime of the tedge_api is by far more complex than the tedge_actors runtime but I don't see that as a major concern. What matters though is that the runtime can be improved without having to rewrite all the plugins. A key test for the tedge_actor API for instance will be to add termination control of the plugins without changing the latter.

I hope this second answer is more helpful.

didier-wenzek avatar May 21 '22 13:05 didier-wenzek

After re-reading the messages in this thread, I have to add some more notes on our proposal.

But first, I want to address some of your questions:

Can ^c handling be moved inside the core or in a plugin?

Technically it definitively can. We could even think about a bit more elaborate API which allows plugins to define that they want to be notified on Ctrl-C and tell the core themselves how the signal should be handled. Definitively a point to think about, but (IMHO) out of scope for the first step.

For me the target is being able to build an application as an assemblage of plugins without deep Rust expertise. This expertise should be moved into the framework and the plugins.

Yes and no. Depends on what you mean with "deep Rust expertise". I believe that in all approaches we saw so far, the same things are required for a plugin author to be decently proficient in: Generics, Async Rust, Traits. Without a basic understanding of these three concepts, writing a plugin is not feasible, in either POC we've seen so far - and I believe there won't be one that takes away these requirements!

I have a mix feeling on the configuration file.

  • Cons:
    • Not so easy to make the relationship with the main.rs.

What do you mean by that?

Why some name are given as string as “collectd” other as TOML identifier as in plugins.collector

That's how toml works. One (in this case "collectd") is a string, the other is a table key (in this case "collector"). The former ("collectd") is the type of the plugin, the other is the name of the instance ("collector").

It’s not obvious to see the graph of plugins.

Yes. @TheNeikos and I already talked about that a lot. Unfortunately, that's a limitation of TOML. We might have some ideas here to provide a graphical config editor for our POC that we might implement during a Hackathon mid-June at our company.

How are addressed the main internal communication patterns. The picture is really not easy to grasp. What kind of communication can be implemented? How? What are the limitations?

So right now we have point-to-point communication. That means 1:1, 1:N and N:1, or in short: N:M ;-) ! That's the very baseline and (so far) has been sufficient for everything we've played with. Of course, you might want more patterns, which is an absolute valid request.

This baseline can be used to implement a more pub/sub style pattern, I like to believe. Request-Response is already included in the baseline via the reply functionality.

Up to recently, the message types were bound to a request type. This seems to be no more the case. Is it?

I'm not sure what you mean by this.

If you mean the associated type Reply on our Message trait: We were able to lift that requirement after your feedback and move reply functionality out of the Message trait itself, making the request-reply pattern more explicit with ReplySenderFor<_>. We can of course elaborate how that works, if you wish.

Why paho_mqtt while there is already an mqtt_channel crate? If not usable it would be good to know why.

The implementation of the the MQTT plugin in our POC is already a few weeks old. IIRC I took the "paho_mqtt" crate because I didn't like the interface of the "rumqttc" crate at all.

For the "mqtt_channel" crate: I had a quick look at it, but found "paho_mqtt" much simpler to use and at the time, I wanted to implement things quickly. :-) Of course we have to decide on one implementation/libraries, but IMHO this is also out of scope for the POC - or rather just a detail that doesn't matter in the grand scheme of things, if you understand what I mean. Rewriting the MQTT plugin to use rumqttc or mqtt_channel is a matter of one day of effort - nothing to worry about right now, I guess.

It matters to have two sm plugin instances (it can be from the fake plugin). The point is to demonstrate message dispatch.

All I would do for another SM plugin would be cp -r plugins/plugin_sm_apt plugins/plugin_sm_other and rename "Apt" to "Other"! I can do that, of course, but I think it just increases the code size and does not help with "reviewability".

It matters to handle the responses of the software-management plugin. Even if these are just ping/pong response. I'm a bit surprise that you did nothing here because everything seems to be in place with ReplySenderFor

Yes, you're absolutely right!

I added some code that shows how reply handling is done. In this commit I added code in the "sm_apt" plugin that simply replies with some "InstallSucceeded" message if there is an install request. In this commit I added code in the "mqtt_sm" plugin that takes that response and sends it (serialized as JSON) back to the MQTT plugin, which then publishes the message on the broker.

You can redo

git fetch https://github.com/matthiasbeyer/thin-edge.io/ feature/add_tedge_api/showcase
git checkout FETCH_HEAD
cargo build -p tedge-cli --features sm,mqtt,c8y,collectd,thin_edge_json
./target/debug/tedge-cli -l info run ./tedge/example-config-showcase.toml

and then you can publish {"type":"Install","package_name":"foo"} on "smrequests", you'll see a reply on "somerandomtopic" which indicates that the package "foo" was installed.


So far for your questions, now some things I want to add:


Just to state this (possibly repeating myself here, sorry) explicitely: If someone wants to implement a new plugin, they have to:

  • Write two types
    • A "Builder" type. This is normally only struct MyBuilder; (yes, not even members!)
    • A "Plugin" type. This is a normal struct MyPlugin { ...maybe some members... }
  • Implement two traits:
    • tedge_api::PluginBuilder (with #[async_trait]) on their Builder with 3 required functions and one optional one
    • tedge_api::Plugin (with #[async_trait]) on their Plugin. This trait has no required methods (in the showcase it still has, in our current tedge_api all of those are optional)

And that's a new plugin. Now they copy some lines in tedge/src/main.rs to include their plugin in the application and they're done.

Depending on what they want their plugin to do, they have to do one of the following things (or both):

  • If they want to emit messages the plugin receives from the outside world (think MQTT, HTTP,... whatever), they need some way to receive these messages from the outside world. For this, they would implement some "mainloop" that they start in their Plugin::start() function.
  • If they want to receive messages from other plugins, they would implement tedge_api::Handle<T> on their Plugin, where T is the type of the message the plugin should be able to receive

That's all. And there can be multiple Handle<T> implementations if the plugin is able to receive messages of different types.

But that's literally all a plugin author has to do. I think this plays exactly into the ideas you wrote down with

Implementing a component might be more involved but should ideally focus on the feature logic. One of the goals of the plugin API is to ease the interaction of independent streams of events and requests acting on some state. Hence, interaction concerns (e.g select!) should be addressed by the runtime, not the plugins. [...]

as it boils down to three tasks that the plugin author has to do (from a high level):

  • Implement the interface how the plugin is created
  • Implement the interface how the plugin is started/stopped
  • Implement the interface on how the plugin can receive messages

And then they can start implementing their business logic.


What matters though is that the runtime can be improved without having to rewrite all the plugins.

Yes, this is definitively a valid point. Though stability guarantees should be worked out in a seperate issue, because it is a rather complex topic!

I like to just note that since we've worked quite a bit (four months of two persons fulltime by now) on the API design and its implementation, we are certain that we have reached a decently stable design so far. Some details are still in flux, but nothing that is of major concern right now.

Still, if we need to change the internal communication API, and the project is in 0.x.y state still, I don't think that's an major concern! As soon as we're in 1.x.y state, breaking the communication API is of course not allowed anymore. Even more reason to spend extra time on a decent approach!


As written above in some answer to your questions, 1:1, 1:N and N:1 style messaging is in the POC already!

1:1 messaging is of course easy, but 1:N is also simple: If a plugin wants to send to multiple other plugins, all it needs to do is save the addresses of these other plugins and send to all of these addresses. That's nothing more than for addr in addresses { addr.send(message) }-style programming, of course - as one would expect.

For N:1-style messaging: As all message handlers are called asynchronously and concurrently, having multiple plugins send to one plugin is already included in the base of the architecture.


NB: We're currently evaluating and defining our response to your proposal of how such an interface could look like.

matthiasbeyer avatar May 23 '22 07:05 matthiasbeyer

Review of Actors POC

Goal of this review:

Thin-Edge.io has the opportunity right now to pivot into a different direction from before. To make sure that this new direction fits the goals of the intended users we have initiated a 'call for proposals' between both SoftwareAG and IFM (who are the main contributors right now). I believe that you've made your points clear in the OP of this issue, but I think some points are worth re-iterating:

  • User experience is paramount, not just from an end-user and daily driver perspective but also from a developer point of view. Particularly this means that a user (be it dev or end-user) will not have complete freedom to do whatever they want (without hurdles at least), but that thin-edge.io provides the tools and framework with which they can do whatever is necessary for an edge device in a straightforward fashion.
  • Compromises happen at the boundaries. If some parts are fundamentally hard to do (async programming being the most difficult aspect IMO here), it should not be hidden behind abstractions, but rather the user-developer should be guided to how their problems can be solved correctly. This can also take the form of an auxilliary crate that implements common patterns which the user can use or simply better documentation.

These two points can be considered equivalent.

With that said, let's dive in.


didier-wenzek:rfc/tedge_api implements a custom 'actor' library composed of these different parts:

  • An Actor trait with 5 associated types, a sync constructor as well as an async start method

    • Making this trait non-object safe, means that it cannot be put behind a dyn pointer. Is this intentional? If yes, how would one solve the issue of having dynamic instances of this actor? (Read, more than one, defined at runtime?)
    • Similar issues with the ActorInstance, due to the generic bound how could an end-user configure this?
    • Actors cannot reply to messages they have received.
      • They can send new messages to another actor, which might not be the same though (so its not really a reply?)
  • Mailboxes seem in an odd place

    • They are 'unbounded', which surely is just an implementation detail, but they cannot be used in a productive environment due to the lack of back-pressure and... being unbounded.
    • Should one make them bounded, how are developers safe from potential dead-locks? The pattern currently is not 'each message is handled in its own execution context', but that a single state has to 'react' to each message one after the other, this can lead to issues in a bounded context.
      • Multiple incoming messages can be interwoven, making it unclear how a Reactor is supposed to handle them. Since either messages need to carry state or the Reactor somehow needs to keep track of what messages it receives from whom. (With no current visible mechanism to do so)
    • As each Producer and Reactor is its own future and get spawned individually, how will one share state with the other?
      • Reactor takes a &mut on its react method, this means that a single type cannot both be Producer and Reactor due to mutability in an async context. (One could imagine that RwLock<T> might implement both for T but then they cannot run concurrently but must be serialized somehow.) This means that a complicated actor would need to be split up between multiple types.
    • How would a streaming send/reply be added onto this abstraction? It seems unclear how it would interact with the current Producer/Reactor types.
      • A clear possibility is that streams are simply 'merged' into the given sender, but this would throw away the information the stream itself represents (aka logical coherence), so does not seem particularly realistic.
  • Cancellation overall was not addressed, but it is quite important, how would a user (read, developer) in this pattern be sure that they have a chance of shutting down gracefully?

    • I imagine a token passed into Actor::start might do the trick. However no central 'we are shutting down' method exists.
  • Actor and Reactor are in a 1-1 relationship. This means that actors can only ever have 'one' logical dataflow. The resulting pattern forces one to thus use an enum to handle multiple potential incoming and outgoing messages. However, no clear relationship between the different message types in the enum exists (and none that is enforceable by the compiler).

    • This also means that anyone wanting to receive messages by that actor will have to either: 1. Handle all messages in the enum correctly, or 2. defensively check which message he did receive and ignore it or error. This would also break a core assumption of interoperability, since 2. should never happen.
    • One can take the assumption that this was done so that each Actor 'does one thing'. However, this also means that the number of edges (i.e. connections) between Actors, as well as the number of Actors, increases, which the user would have to somehow configure. Either explicitely, which increases the cognitive load, or implicitely which also increases the cognitive load but can also be unintuitive in error cases.

I think its hard to formulate a conclusion. There are definitely some things that can be taken away with regard to simplicity, but it does leave open several questions that are IMO fundamental.

I think due to the fact that it is so 'simple', it is not conclusive on its suitability.

I would love to hear from you @didier-wenzek, what you thought is relevant in this POC that we might have missed for as to why you would favor this over an already (mostly) complete implementation.

TheNeikos avatar May 23 '22 09:05 TheNeikos

@TheNeikos I will be as direct as you are, starting with the controversial aspects.

I think due to the fact that it is so 'simple', it is not conclusive on its suitability.

On my side, I think due to the fact that tedge_api is so out of control for the original team and so disconnect from what has already been done, that the associated POC is not conclusive on the ability to have a migration plan.

However, I still hope that we can reach a point of agreement.

I would love to hear from you @didier-wenzek, what you thought is relevant in this POC that we might have missed for as to why you would favor this over an already (mostly) complete implementation.

I think you missed two points:

  • I will never say yes to a complete redesign of thin-edge without a minimum of control.
    • When you say "our proposal has a mostly complete implementation", "We've worked quite a bit (four months of two persons fulltime by now)", "The whole core implementation is a bit more involved and has now been ongoing for about three months" you are not reassuring me. On the contrary.
    • Because during that time you asked mostly no questions on thin-edge, our needs, our pain-points, our plans. So how can you be sure that this is what the project needs? During that time, the proposed API evolved, things have been added, others removed, a runtime is secretly under implementation. All this without any awareness from our side of what are your plans, your design issues, the decisions taken, the drawbacks.
    • Sure, you also wrote "This PR will of course feature an in-depth explanation on how things work and how things were implemented once it is opened!". Okay, but the point is not for us to buy a turnkey design.
  • I will never say yes to a complete redesign without being sure that we can make it progressively re-using large parts of what has already been written.
    • This is why is asked for this POC
    • But, when I see that you redefine what a measurement is, that you use a different mqtt crate, that you see as useless to have 2 software plugins, that pub/sub will be done later, that you avoid to use thin-edge crates (with the notable exception of thin-edge-json) ... when I see all that, I can make the mental effort to fill the gaps. But, you really don't help me to take a decision.

This is why I started to work on my own proposal.

  • At first, my goal was just to explore the design issues implementing actors without actix, and to understand what would be the benefits of using actix or tedge_api.
  • This has also been the opportunity to see how things would be simplified by ignoring a requirement you see as key but not me. What if actor can only be created in main.rs and not dynamically from a runtime config? Sure, dynamic instantiation might be useful but not at any cost. And, my intuition is that most of the complexity of tedge_api comes from this requirement. This is also why you rejected the idea of using actix.
  • Then, I decided to push the exploration a bit deeper. I believe that dynamic creation of actors from a configuration file is not useful to re-implement thin-edge. But is this actually true? Hence, a POC for tedge_actor.
  • This took me around 5 or 6 afternoons, and I'm fully aware of many key missing points - I even stressed them myself in the PR and in this issue: namely 1-n and n-1 connections, request/response, cancellation. I will spend 1 or 2 afternoons to implement what I have in mind.
  • I'm highlighting the time spent because I do think that the effort on a POC must be aligned with the level of agreement. Spending some days on a task without agreement is okay; spending 4x2 months on a PR without agreement nor collaboration can only lead to frustration.

There are definitely some things that can be taken away with regard to simplicity, but it does leave open several questions that are IMO fundamental.

  • On purpose the model of computation is that "an actor reacts messages one after the other updating its state". I still don't understand the computation model proposed by tedge_api, despite many questions and answers on the PR. What does it mean "each message is handled in its own execution context"? What is the context of a message?
  • It was intentional to focus on static instantiation, but not to block dynamic instantiation. I can remove the Sized constraint on the Actor type.
  • I'm fully aware that the channels must be bounded and that there is a risk of deadlock with a cycle of bounded channels. To be honest, I have no idea on how to address the issue with cycles yet.
  • The main point that is missing in the current state of the POC is related the communication patterns (1-n, n-1, request/response). I will add this first to remove confusion on how messages can be interwoven and dispatched.
  • Yes, cancellation handling is missing. I will try to sketch what I have in mind.

didier-wenzek avatar May 23 '22 20:05 didier-wenzek

Finally, a POC has been implemented using a different strategy. Instead of gluing together various partially-implemented actors, we implemented a small number of functional actors, with the aim to go deeper exploring the concrete issues and the ways to solve them.

didier-wenzek avatar Feb 13 '23 16:02 didier-wenzek