lunatic New plugin system

Lunatic already has a plugin system that is not well documented or particularly useful in the current state. It started out as a way to dynamically change host functions with WebAssembly modules, but ended up being a tool to modify WebAssembly modules before instantiating them.

I want to use this issue to discuss pros and cons of different approaches to a plugin system and figure out what the best path forward would be. In my grand vision plugins would be an easy way to extend the VM without needing to recompile it, just by adding Wasm modules to it. First of all this means extending the host functions. It could also be a playground for adding new functionality to the VM; we can test it as a plugin behind a feature flag and later provide a native implementation. This could potentially also improve development speed, because plugins can be dynamically loaded and added to environments inside one VM instance. Testing plugins or comparing performance between competing implementations becomes trivial, just spawn different environments, attach different plugins to them and run the same code. You could implement plugins in this case in any language compiling to Wasm, you would not only be limited to rust when extending the VM.

Another use case could be replacing existing functionality with custom implementations. One example of this would be to provide a custom networking layer for the distributed lunatic implementation.

We can go even further here and allow people to load native shared libraries that are exposed as host functions (an idea of @jtenner). Or even allow plugins to modify the loaded Wasm code before JIT compiling it, an approach that our heap profiler plugin is taking.

The most important questions I would like answered here are:

[ ] How do we provide a good user experience around plugins? If creating plugins is too complicated and the complexity around them is hard to manage, nobody is going to be using them.
[ ] What kind of plugin system would give us the best trade-off between user experience and performance?
[ ] Should we allow native code (that could potentially crash the whole VM) plugins?

1. Plugins that provide host functions

Originally I wanted lunatic to be just a set of core APIs (networking, filesystem, processes, ...) and every other functionality would be implemented as a plugin in WebAssembly on top of it. Around the same time Wasmtime got support for module-linking and I saw a great opportunity to actually accomplish this. The idea was simple, you could add new host functions to processes by defining them inside WebAssembly plugins. These host functions would provide higher level abstractions on top of the core APIs or even shadow/change the core APIs.

Lunatic would provide a set of "standard" plugins. WASI could be implemented that way, re-exposing lunatic's core filesystem API to guest code through a WASI compatible interface. You could also provide your own plugins by passing them to the lunatic binary: lunatic --plugin my-wasi.wasm .... That way you could provide a WASI implementation that actually uses lunatic's networking API and proxies your filesystem read/writes to a network storage.

It would give us an opportunity to keep the core API simple and small, but provide an elegant way to extend it without needing to step out of the WebAssembly space. The developer experience around creating these plugins would be simple, you just import existing host functions and use them inside new host functions that you export.

There are a few drawback to this approach. The module-linking proposal requires each module to be instantiated separately. This means that for each process we are now creating not one WebAssembly instance, but n (where n is the number of plugins we are loading). Process spawning speed is really important for most lunatic use cases and having it progressively slow down as we are extending functionality with plugins was not something I could easily accept. One solution here could be to have the module-linking implementation be lazy and only instantiate if we actually ever use plugin provided host function from a particular process, but the WebAssembly runtime we are using (Wasmtime) doesn't provide such a feature.

A bigger issue is the performance overhead when proxying calls. Let's say someone provides a wasi-socket plugin that builds on top of our core networking API to provide a WASI compatible networking interface. Now writing to the network would require us to first copy all the data from the guest process heap into the wasi-socket plugin instance. If the wasi-socket was building on top of other plugins it would even take more copies, because each layer can just talk to the next one and move data in-between. An alternative to creating all of theses copies would be a much more complex implementation of APIs that allow you to export different memories from different plugins and have a system of core APIs that also take memory "indexes". However, such a system would be much more complicated to work with. You would need to keep track of memory references and slices inside of plugins, on top of the logic you are implementing.

The interface-types proposal is not much of help here as it also assumes copying. From what I understand reading the spec, the Wasm runtime could potentially optimize out this copies, but it would only work between guest code and native host functions, and can't be optimized out between Wasm modules linked together (our case). There are some additions to interface types that could make it work (streams), but it's hard to tell at this stage how exactly everything will fall into place or when this might land.

Lunatic's design shines in I/O heavy workloads and introducing additional overhead is a no-go here. The only way forward would be our custom implementation with memory indexes and then eventually move to interface-types streams once they are ready.

2. Plugins that modify the WebAssembly bytecode

There are some use cases where you want plugins to modify the loaded WebAssembly bytecode. Our heap profiler plugin works that way. It examines the module, looks for functions with names such as malloc, alloc, or similar, and inserts additional call instructions into them that hook up to some host functions that count the memory usage.

One big issue with this approach is that it's really hard to modify WebAssembly modules correctly and depending from which host language you are compiling to WebAssembly the assumptions you are making about the generated bytecode may not hold (e.g. language doesn't have functions with the names malloc, alloc, ...).

The first implementation of this system in lunatic was just giving the whole binary module to the plugin and taking the modified one back. This required each plugin to ship with a whole WebAssembly parser inside and always re-parse the module. Also the order in which the plugins are loaded become significant in this case and can introduce weird issues.

The current implementation parses once the module and exposes hooks to plugins to query the module for functions, modify them or add new ones. I think that this is an ok way forward, it keeps the plugin modules small and we can only expose hooks that are "safe" to use (won't produce incorrect bytecode). It limits the number of ways in which you can modify the module, but this could be a good thing. I spent a lot of time writing Wasm code modification in the first versions of lunatic and it's super easy to produce broken modules.

Even it's super hard to write correct plugins by modifying the Wasm bytecode, I feel like there are always going to be cases where we need it. One good example would be providing polyfills. For some time lunatic used reference types in host functions to manage resources, but many languages don't support reference types on the guest side (rust, c, ...). What we would do is check during loading of the module if there was a mismatch between signatures on the guest and host, if yes we would polyfill the guest with wrappers that save the reference types into a local table and return the index (i32) inside of the table to the guest code that knows how to work with i32 values. If we ever want to introduce host functions that use reference types or interface types again, we will probably need to provide some polyfills for languages that don't understand these "higher level" concepts.

3. Native plugins

This idea is quite simple, but I'm not sure how it would be implemented or if it would even make sense to have it. You would be able to add a native shared library as a plugin (e.g. lunatic --plugin dangerous_stuff.so/dylib/dll ...). Of course these plugins would be OS and CPU architecture specific, but they would give you the power to call any native functions directly from Wasm modules. You would simply use them like every other host function, just import them by name.

I assume that eventually someone is going to want to hand write some assembly, use a specific peace of hardware or just want to call a library that can't be compiled to WebAssembly yet. This would be the easiest way give them access to it. On the other hand this breaks all security for the current VM instance and all processes inside of it, as the native code can do anything it wants. Once we get distributed lunatic this can be worked around by running a node just for native plugins, that would not have access to the memory space of other instances of the VM, basically isolating any damage the native plugin could do just to a small set of selected processes.

Conclusion

These 3 points should cover almost all use cases. We just need to figure out what's the right balance of power and complexity we want to actually expose to plugin writers. How can we make plugins performant and a joy to write?

I would also love to hear feedback from the community, what do you think? Are there maybe alternative approaches that would be interesting?

Oct 20 '21 16:10 bkolobara

I'd evaluate these three ideas from the perspective, which one can give lunatic the traction that is much needed to attract more developers to be involved. WebAssembly standardization is a slow process that also impacts lunatic development.

Therefore I'd try to start with native plugins, even though there are security questions, just because we can see other similar runtimes decided to go in that way. If I exaggerate, native plugins will eventually become obsolete, but now they can represent a good feature to convince the developers that they can actually really start designing and developing solutions built around this amazing runtime.

I don't want to diminish the importance of the other two types of plugins, which are far more important in the mid/long term. However, they are somehow more connected to the things we don't control and that's WebAssebly standardization and its implementation in different languages.

Oct 20 '21 18:10 mydnicq

I think around the safety of native functions, the story would probably be something similar to Rust's unsafe - the application programmer has to accept that using native functions might cause crashes, and that all code written inside a native function surrenders the isolation capabilities that the VM would otherwise provide. This is similar (as far as I understand) how Erlang's native-implemented functions work.

My feeling is that the cost of moving native-implemented functions to a separate node is quite high (every function call requires a network request), and maybe therefore not worth the cost. Maybe we could provide some basic isolation by running plugins in a thread pool (which would also stop them from blocking the scheduler), which would restart the threads if they were to crash.

Jan 25 '22 10:01 teymour-aldridge

My feeling is that the cost of moving native-implemented functions to a separate node is quite high (every function call requires a network request), and maybe therefore not worth the cost.

The native functions would definitely be provided to wasm files running on them, no need for remote calls. I was talking here about a possibility to organise your architecture in a way that you dedicate some nodes to unsafe plugins, but if not all of the processes need access to the native functions, you can schedule them on non-native (safe) nodes.

To illustrate it with an example. You load a DB native library on one node and spawn a process managing DB connections on it. If you want some extra safety, you can now start another node without the plugin and let the processes running on it talk to the DB manager. If something goes horribly wrong and the plugin takes the first node down, the rest of the system is still going to work and can serve a cache of the DB or something.

If we don't force the plugin on all nodes in distributed lunatic we get the possibility to mark nodes as safe/"unsafe", but the unsafe ones would still be able to schedule lunatic processes. Just the safe ones would not be able to schedule processes requiring unsafe native functions.

Jan 25 '22 11:01 bkolobara

After some brief expermenting, I was able to get a native dylib loaded into Lunatic VM with a lot of ease.

Originally I wanted lunatic to be just a set of core APIs (networking, filesystem, processes, ...) and every other functionality would be implemented as a plugin in WebAssembly on top of it.

Could we take this to the extreme, and have everything be plugins, similar to how Bevy works? So if you don't even need networking, or filesystem access, you can pass --no-default-plugins. And just provide the things you need under the --plugins process,registry.

There are a few drawback to this approach. The module-linking proposal requires each module to be instantiated separately. This means that for each process we are now creating not one WebAssembly instance, but n (where n is the number of plugins we are loading).

This is only the case if plugins are wasm modules right? If we go with native plugins, then plugins can just export some basic functions which are used in the VM. These functions could include:

fn register(linker: &mut Linker<T>) -> Result<()>; for adding host functions.
fn config(config: &mut Config); for modifying the wasmtime engine config. An alternative to this could be passing flags to the command line.
… nothing else comes to mind. I think register and config cover a huge majority of plugin cases, such as adding host functions for talking to a database, etc.

I think around the safety of native functions, the story would probably be something similar to Rust's unsafe - the application programmer has to accept that using native functions might cause crashes.

💯 agree! We could even wrap plugins with std::panic::catch_unwind to handle panics in some way.

Nov 14 '22 06:11 tqwewe

Thanks for experimenting with this! It's looks awesome so far.

This is only the case if plugins are wasm modules right? If we go with native plugins, then plugins can just export some basic functions which are used in the VM.

Yes.

Nov 14 '22 09:11 bkolobara

lunatic lunatic copied to clipboard

New plugin system

1. Plugins that provide host functions

2. Plugins that modify the WebAssembly bytecode

3. Native plugins

Conclusion

lunatic
lunatic copied to clipboard