Memory Sharing Discussion
Introduction and Context
There have been a number of conversations and questions raised about the possibility to share subsets of memory between multiple WebAssembly instances and the host environment. While it is possible today to share memory between modules, via importing and exporting linear memory, this doesn't address the need to map memory to some form of host-controlled resource to one or more WebAssembly instance.
Based on the systems we are targeting it is becoming evident that there will be some form of hardware or host system support for shared memory, this can and probably should be used. The acceptance of this hardware support may change how we control and provide access to shared memory.
I've tried my best to summarize a discussion that has occurred via email and to open the discussion to a wider audience. Please do feel free to correct me if I'm missing anything.
Problem Statement
There is a need for low latency high throughput, none sequential access to data shared between the host environment's hardware, multiple host OS processes and multiple WebAssembly modules.
To address this requirement it is proposed to add a memory sharing mechanism to WebAssembly, one which will allow the host system and multiple WebAssembly modules to access the same host memory, which could optionally be backed from some form of host supplied resource.
In short, optionally hardware / resource backed memory which can be shared between multiple host operating system processes and between multiple WebAssembly instances hosted in one or more operating system process. The 4 key requirements are:
A. Share multiple subsets of memory between Wasm modules B. Share multiple subsets of physical memory between hardware / host and Wasm instances C. Memory access patterns within the subsets of memory are random in nature D. Have zero copy access to this memory
Use Case
Industrial and embedded systems often operate multiple threads, all running in tight loops which sample data from host hardware and host software and share that data between a number of WebAssembly modules before writing back to host system. Both the volume of data being read / written and the location of load / store events within the shared memory region memory changes over the run time of the application. The resulting access patterns are effectively random and impossible to predict.
Supported Device Types
The following are device types which are targeted to support this type of functionality
- Devices with MMU
- Ability to use
mmapor similar POSIX like functionality from host system
Unsupported Devices
Devices which do not possess an MMU can still implement a form of shared memory, at least logically, even if the resulting access is slower. This is demonstrated today when Linux runs on devices without an MMU. Theses Linux devices support a version of mmap which is backed by a file like primitives causes a hidden copy to occur. This implies that logically similar functionality can be implemented in a wide range of device types, albeit with much slower performance.
Memory Sharing Scope
This discussion is only focused on the host / WebAssembly mechanics of sharing access to the same physical memory between hardware / OS processes and WebAssembly code.
Out of Scope
The following considerations are derivative problems which may arise once the ability to share memory has been established. Addressing these issues requires that some form of shared memory must already exist. Therefor for the purposes of this discussion the following items are deemed out of scope for this issue:
- Synchronization of memory access; The multithreading proposal addresses this by providing the atomic operations required to implement synchronization primitives such as semaphores - this however is dependent on shared access to the same memory space. Additionally, the
extern_refdata type allows for the safe passing of references to OS supplied host implementations of such privatives too. It is therefore not necessary to address this item, and it is deemed out of scope for this issue. - Data representation and memory layout; Differences in machine word size, and endianness between the host and Wasm affect how the data present in a shared memory space should be interpreted in situ. There are a myriad of solutions to this problem, including the approaches defined by the component model. Indeed, as noted above this a problem on arises if memory can be shared in the first place. Therefor it is not necessary to address this item, and it is deemed out of scope for this issue.
- Heap Management of Shared Memory; Allocating and freeing ranges within shared memory as derivative problems and out of scope for this discussion.
Conversation Update and Suggested Solutions
There has been an email exchange between a number of people, which I will try to summarize below. In total there have been about 7 proposed approaches. Once posted, I'll share a link to this issue to everyone involved in the email conversation to provide more context, where I might have missed it. Please accept this write up as a "best effort", rather than an exhaustive report.
Option 1: Expose mmap
It is possible to expose the mmap function in a controlled way to a WebAssembly application, as long as the result of invoking mmap places the memory inside linear memory, and makes sure to remap it to the same offset when operations like mem.grow occur. There have been experiments and PoCs produced by WAMR which show this can work.
There are issues with this approach, of course, the mmap function would need to be updated to replace the file descriptor and to replace it with another identifier for a hardware resource, perhaps an extern_ref. Additionally, the read, write and execute access requests would need to be changed to just read and write.
Could this functionality could be exposed as a builtin?
Providing some form of access to mmap address all 4 of the requirements(A-D) in the problem statement above.
Option 2: Use Multi-memory and Map a 2nd Memory to mmap
Suggested by Siemens, to avoid exposing mmap directly, with thought of an alternative design. We could use a second memory and mmap this at wasm instantiation before the wasm executes. This becomes a configuration statement. We can place the configuration statement inside a custom section in the wasm module.
This has the advantage of reducing the management overhead of a dynamic mmap but introduces issue with the speed of memory access, at least today. At the moment there is no way to directly reference and access the second memory from C, so we hand wrote some getter / setter functions to address the second memory. This works but is going to introduce a performance impact of about 300% when compared to a direct memory access.
If there were changes to address how the second memory could be accessed, and we could do it without these getters and setters this would be a good alternative. However, with-out this it looks like this approach addresses A, B, and C, but effectively has a copy, so doesn't address point D in our requirements.
Option 3: Share lower memory region between module and host
This option has been documented here https://github.com/WebAssembly/memory-control/blob/main/proposals/memory-control/Overview.md. This addresses the need to share memory between the host and a WebAssembly instance. The use of a specific part of linear memory aids the performance when doing load /store calls in knowing if the runtime should write to the shared memory or local linear memory.
Given that, we're considering using hardware / host resources to implement memory mapping into a Wasm instances linear memory, do we still need to do this?
Option 4: Share upper memory region between modules
In this proposal has been implemented and documented here: https://github.com/bytecodealliance/wasm-micro-runtime/issues/3546. Just like the option 3 above, the use of a specific part of linear memory aids the performance when doing load /store calls in knowing if the runtime should write to the shared memory or local linear memory. This was designed to address the need to share memory between modules, but perhaps not between physical host memory and a module. Could this approach allow Wasm instances to share multiple areas of memory? - e.g.. Instance 1 and 2 share memory, and instance 2 and 3 share memory?
As mentioned above, given that, if we can use the host system to map memory regions into linear memory do we still need to do this?
Option 5: Presenting Shared Memory as a Component Model Construct
In this option the shared memory represented by a component model stream or flat data area. It's usage is marshalled by the component model constructs and the data layout and synchronization primitives can be implicitly provided. Behind the scenes the CM will use mmap to construct the shared memory regions. It is probable that runtimes will need to be extended to provide the housekeeping and tracking for managing the shared memory regions.
Option 6: A Memory Reference
There have been previous suggestions of a memoryref as a way of passing around dynamic memories between Wasm modules. https://github.com/WebAssembly/multi-memory/issues/19#issuecomment-813916746. This, based on the discussion and Andrea's comments this looks like quite an involved effort, but potentially one that might help address the use of annotations to access multi-memories.
I would like to point towards my proof of concept implementation which exercises attaching shared memory in wasmtime - using the component model to pass the memory buffer between two fully insulated wasm modules.
While the WIT file describes a way (flags argument) to make sure that information is never duplicated and that race conditions are excluded, this host side implementation doesn't implement that, yet. How to wait for a buffer to become available for reading/writing could be part of a separate proposal.
For now I rely on the host to tell how much linear memory is needed (rounding up to fulfill page alignment etc.) and on the guest to provide this amount in its linear memory space. The host is still free to ignore this memory location and will likely return a different address due to page alignment. Unfortunately the used address leaks the alignment of physical host pages to the guest.
MPU based solutions could request a zero size and always return the virtual address (subtracting the physical start of the linear memory) of the shared physical buffer, rendering the linear memory non-contiguous but still insulated by hiding alien memory from user mode using the MPU.
Also because of single-write exclusive or multiple-read attachment constraints the host could fall back to copying the "shared" memory in and out of linear memory if necessary. This is transparent to any guest - as a write always needs to detach before any reader can see the contents.
Here are some use cases and practices that the WAMR has encountered so far. I hope you find them helpful.
-
Case A: The product is a proxy, and WebAssembly modules act as filters on it. The host receives messages and uses WebAssembly plugins to process them. The host employs a WebAssembly allocator to create space in the linear memory when a message arrives and to store incoming data. The WebAssembly plugins will then use these messages and also utilize native libraries (provided by the host) for message processing.
-
Case B: The product is an embedded device running WebAssembly applications. These applications need to access the entire memory of the host. The WebAssembly aims to extend its linear memory to span the entire memory range. However, WebAssembly applications are not the only ones running; there may be applications written in other languages. Sometimes, WebAssembly applications need to work alongside these other language applications. WebAssembly applications also have their own native libraries, which require full memory access as well.
-
Case C: This is also about an embedded device running WebAssembly applications. The major difference in this scenario is the presence of multiple RAMs. These need to be accessed using different address spaces in the C language. Both the WebAssembly modules and their native libraries need to access the host's memory. One type of access targets a fixed area that is set up by the system during booting. The other type of access targets random areas of memory.
In all the cases mentioned:
- Typically, the host generates the content, while wasm applications and their native libraries consume it. Sometimes, the native libraries may also write back to the host, but the host is always the one that starts the process.
- A portion of the linear memory is utilized by the host. This means the runtime must create a larger linear memory space than what is defined in the .wasm module. Compared to using wasm-lld options to customize the memory size in the .wasm module, controlling it at runtime is more adaptable and can manage varying workloads executed by wasm.
- Occasionally, the host provides a fixed range of memory (which is separate from the linear memory) and wants wasm to consider it as an extra part of the linear memory. This allows the wasm application and its native libraries to access it.
As a result,
-
It is complex to have the WebAssembly (wasm) developer handle the task of allocating memory for the host and then passing the address back to the host. It's even more challenging if the host has a fixed memory range and the wasm developer must attach it to a specific part of the linear memory. (but maybe packed them into a some kind of library and linked always with)
-
The operation memory.grow is always impacted.
-
It is advisable to avoid using a fixed-size, pre-allocated area as a buffer for data exchange between the host and wasm. Determining the size needed to reserve can be difficult.