design
design copied to clipboard
Integrating multiple memories and 64-bit addresses with other extensions
Seeing the shared memory atomics proposal and the SIMD proposal, it struck me that it will be painful if they are not orthogonal to the planned feature to allow accesses to multiple memories. I think the multiple memories feature is relatively low priority, but it seems it's worth fleshing out how it will work to avoid pain integrating it with earlier extensions.
The most general form of accessing multiple memories would require allowing values that are references to memory objects, which has a dependency on GC references.
A high !/$ form of the feature would just add a flag to memory_immediate that indicates there's an extra field with an index into the module's memory index-space.
A way to implement the more general form of the feature without creating a dependency on it from any other extensions that add memory access operators would be to add an operator that modifies the following memory operator. However, it adds a new variety of state to the WASM virtual machine, so I'm not convinced it's the way to go.
This also applies to adding support for 64-bit addresses. Are we going to end up with (32-bit | 64-bit address) x (default memory | immediate memory index | memory operand) versions of all the memory operators?
I'd explore two possibilities for multiple memories:
- Specified as an immediate on a memory access. This is what LLVM has in its IR.
- Specified as a variable value which affects memory accesses. This is closer to a segment modifier.
This indeed needs to mesh well with atomics and SIMD, but the only interaction is that memory accesses need to also work with multiple memories. Presumably atomics and SIMD will work the same as current existing memory accesses, so I think there's little to no risk of failure.
On 64-bit: I think we'd discussed having wasm32 and wasm64 as separate binaries which can't interop (even through dynamic linking). We'd also discussed doubling all memory access opcodes to support 64-bit accesses. I don't think we'd settled on one approach.
On 64-bit: I think we'd discussed having wasm32 and wasm64 as separate binaries which can't interop (even through dynamic linking). We'd also discussed doubling all memory access opcodes to support 64-bit accesses. I don't think we'd settled on one approach.
It doesn't seem possible to block interop between wasm32 and wasm64 as long as both interop with JavaScript.
Agreed on wanting both immediate linear memory index and, later, first-class Memory GC reference types. To avoid adding a whole duplicate classes of opcodes, what I've been assuming is that the memory_immediate immediate of all the existing memory ops would indicate one of three cases: default (what we have now), immediate index (in which case the opcode has an additional memory-index varu32 immediate), reference (in which case the operator's signature gains a Memory operand).
Agreed on wanting both immediate linear memory index and, later, first-class
MemoryGC reference types. To avoid adding a whole duplicate classes of opcodes, what I've been assuming is that thememory_immediateimmediate of all the existing memory ops would indicate one of three cases: default (what we have now), immediate index (in which case the opcode has an additional memory-index varu32 immediate), reference (in which case the operator's signature gains aMemoryoperand).
I'd like to avoid prescribing the order in which these are done. If it turns out variable index is superior in every way then it would be undesirable to do immediate first.
Sure, the ordering wasn't the point of my comment.
reference (in which case the operator's signature gains a Memory operand).
If we did this, that would be the first instance where you can't just look at the opcode to determine the operand signature of the operator. Maybe worthwhile, but it's a cost to consider.
I'd like to avoid prescribing the order in which these are done. If it turns out variable index is superior in every way then it would be undesirable to do immediate first.
I don't think @lukewagner was referring to an operand-indexed form, just immediate-indexed or operand-referenced forms. An operand-indexed form would be somewhere between the operand-referenced form and the immediate-indexed form. Maybe that would be useful without requiring the full power of Memory refs on the operand stack, but it's painful to expose the module's memory index-space to some value that's possibly being passed around between different modules.
The immediate form has a significant advantage in both code size and the ease of generating good code from it, so the question is just whether that form is useful to any of the actual applications for multiple memories.
If we did this, that would be the first instance where you can't just look at the opcode to determine the operand signature of the operator.
Actually, that's not true. The call and call_indirect operators' operand signatures depend on immediates.
reference (in which case the operator's signature gains a Memory operand).
If we did this, that would be the first instance where you can't just look at the opcode to determine the operand signature of the operator. Maybe worthwhile, but it's a cost to consider.
I don't understand. Can you clarify with pseudo-code what you mean? I saw your point about call / call_indirect, but what I'm proposing doesn't affect any signature.
The immediate form has a significant advantage in both code size and the ease of generating good code from it, so the question is just whether that form is useful to any of the actual applications for multiple memories.
Agreed, but I want us to ask a wider question: is the operand-indexed form more useful?
I don't understand. Can you clarify with pseudo-code what you mean? I saw your point about call / call_indirect, but what I'm proposing doesn't affect any signature.
@lukewagner listed three possible forms for memory accesses:
- default form
- immediate index form:
memory_immediatewould be extended to include the index of a memory in the module's memory index space. - reference form:
memory_immediatewould be extended to include a flag that, when set, causes the operator to pop an addition value from the operand stack with the typeref<Memory>.
My point was that the reference form would change the values popped from the operand stack depending on the operator's immediates. I mistakenly thought that would be the first case where that happens, but I was forgetting calls and branches.
My point was that the reference form would change the values popped from the operand stack depending on the operator's immediates. I mistakenly thought that would be the first case where that happens, but I was forgetting calls and branches.
Ah gotcha, thanks. Even if it were the first time, we could make it a separate op instead to reduce your concern?
We have already planned ahead for allowing multiple table indices. In particular, some of the late changes to the binary format made sure that every instruction related to memory has suitable extension points allowing to reference other memory indices than 0 (which is what they currently do implicitly).
First-class memories is a more substantial extension that depends on GC types. It would either require new opcodes, or hacking the interpretation of the index field of the existing one in some way.
An intermediate solution that could be possible without GC types would be
memories as table elements. That is, introduce a new element type memory
and allow tables to index (multiple) memories. You'd probably still need
new instructions or some hack to the existing ones to access it.
Ah gotcha, thanks. Even if it were the first time, we could make it a separate op instead to reduce your concern?
Personally, I'd weigh the cost of this kind of immediate-dependent operator operands as being lower than the cost of having multiple variants of all the memory access operators.
I'd consider it an acceptable solution to say that adding various flags to the memory_immediate causes the memory operator in question to change to one of the immediate index or reference forms, and that the SIMD/atomic extension memory ops handle same forms supported by the non-extension memory ops.
Same goes for the 64-bit memory ops: I'd prefer a 64-bit address flag on the memory type, or simply making the memory ops accept either 32-bit or 64-bit addresses, to adding 64-bit address variants of memory ops. I don't want to end up with i64.atomic.address64.memRef.rmw32_s.cmpxchg. :)
An intermediate solution that could be possible without GC types would be memories as table elements. That is, introduce a new element type
memoryand allow tables to index (multiple) memories. You'd probably still need new instructions or some hack to the existing ones to access it.
I like that approach in general as a way to add operators for manipulating memories, tables, globals, modules, instances, etc without making those types first class.