Working with GC types without copy
I'm having some trouble switching to wasi preview 2.
For example, the following interface:
package wasi:[email protected];
interface random {
get-random-bytes: func(len: u64) -> list<u8>;
}
The function signature is func (u64) -> (list<u8>)
But its lower type is core func (i64, i32) -> (), which is very difficult to use.
If I want to convert it to core type (array (mut u8)), a very long glue code is required.
I hope to add a GC mode canon option that can make the lower type similar to core func (i64) -> (array u8).
For complex nested types, getting the specified data requires very complex pointer algebra, whereas if using array it only requires multiple array.get.
I think this helps simplify the use of some external interfaces, such as:
package wasi:[email protected];
interface preopens {
get-directories: func() -> list<tuple<descriptor, string>>;
}
Yes, agreed. It's definitely the plan of record to add a gc canonical ABI option, just like you're describing. (It's one of the original motivations for having an IDL that abstracts low-level memory representation, even.) We've mostly been waiting for (1) wasm-gc to be finalized, which it now is and (2) an implementation of wasm-gc to show up in a runtime that also implements components (e.g., one is in progress in Wasmtime). But, if you or anyone else wants to run ahead and create a PR adding the gc option to the proposal (Explainer.md, Binary.md and, mostly significantly, CanonicalABI.md), that would be welcome too.
Before ref-types, gc-types, stringref and other features are stable, we have enough time to discuss how the gc language should obtain wasi data.
In fact, after considering gc types, there is a better correspondence between the wasi type and the wasm type.
No options indicate pointer mode, add reference-type(tentative) to indicate conversion to immutable reference, add mutable-reference(tentative) to indicate internal mutable reference.
| Upper Type | Lower Type | Canonical Options | Requisite |
|---|---|---|---|
u32 |
i32 |
||
tuple<u32, u32> |
(i32, i32) |
||
tuple<u32, u32> |
(struct (field i32) (field i32)) |
reference-type |
gc |
tuple<u32, u32> |
(struct (field mut i32) (field mut i32)) |
mutable-reference |
gc |
record {a: u32, b: u32} |
(flatten layout) (i32, i32) |
||
record {a: u32, b: u32} |
(struct (field $a i32) (field $b i32)) |
reference-type |
gc |
list<u8> |
(i32, i32) |
||
list<u8> |
(array u8) |
reference-type |
gc |
list<u8> |
(array mut u8) |
mutable-reference |
gc |
string |
(i32, i32) |
||
string |
stringref |
reference-type |
gc, stringref |
string |
(string.encode_utf8 stringref) |
reference-type + string-encoding=utf8 |
gc, stringref |
borrow<string> |
string_view |
reference-type |
gc, stringref |
resource |
i32 |
||
resource |
externref |
reference-type |
ref-types |
flags |
(flatten layout) (i32 × ⌈flags / 32⌉) |
||
enum |
i32 |
||
option<u32> |
(ref null i32) / i31ref |
reference-type |
gc |
option<t> |
(ref null T) |
reference-type |
gc |
result<t, e> |
? | ? | ? |
variant |
? | ? | ? |
variant may be similar to subtype with downcast in gc context.
- Cross Link: https://github.com/WebAssembly/gc/issues/531
Another benefit is that if all gc types are used, there is no need to bring in a memory allocator, which helps reduce the size and warm up faster.
rustc's cabi_export_realloc takes about 27000 lines of wasm instructions(release mode), libc is even larger.
Other smaller allocators sacrifice either speed or security.
(component
;; Define a memory allocator
(core module $MockMemory ;; Replace here by an actual allocator module, such as libc
(func $realloc (export "realloc") (param i32 i32 i32 i32) (result i32)
(i32.const 0)
)
(memory $memory (export "memory") 255)
)
(core instance $mock_memory (instantiate $MockMemory))
;; import wasi function
(import "wasi:random/[email protected]" (instance $wasi:random/[email protected]
(export "get-random-bytes" (func (param "length" u64) (result (list u8))))
))
;; wasi function to wasm function
(core func $wasi:random/[email protected]/get-random-bytes (canon lower
(func $wasi:random/[email protected] "get-random-bytes")
(memory $mock_memory "memory")
(realloc (func $mock_memory "realloc"))
))
;; import wasm function
(core module $TestRandom
(type (func (param i64 i32)))
(import "wasi:random/[email protected]" "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes (type 0)))
)
;; instantiate wasm module with wasi instance
(core instance $test_random (instantiate $TestRandom
(with "wasi:random/[email protected]" (instance (export "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes))))
))
)
If using the gc type, this can be simplified to:
(component
;; import wasi function
(import "wasi:random/[email protected]" (instance $wasi:random/[email protected]
(export "get-random-bytes" (func (param "length" u64) (result (list u8))))
))
;; wasi function to wasm function
(core func $wasi:random/[email protected]/get-random-bytes (canon lower
(func $wasi:random/[email protected] "get-random-bytes")
reference-type
))
;; import wasm function
(core module $TestRandom
(type (func (param i64) (result (array u8))))
(import "wasi:random/[email protected]" "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes (type 0)))
)
;; instantiate wasm module with wasi instance
(core instance $test_random (instantiate $TestRandom
(with "wasi:random/[email protected]" (instance (export "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes))))
))
)
Obtaining a field of gc type requires only one instruction and does not require pointer algebra (at least three instructions), further reducing the binary size.
Yes, really good point regarding mutability vs. immutability; we probably do want both as ABI options. A really nice benefit of immutability is that if both sides of a component-to-component call use immutable GC references, no copy needs to be made when passing a reference across the boundary. OTOH, if your language ultimately does need a mutable array of bytes, then the immutable GC option may impose an extra unnecessary copy; thus having both options make sense.
String its its own story, but definitely a Unicode-encoded (array u8) makes sense (if we treat string-encoding as orthogonal, then all three of utf8, utf16 and latin1+utf16 could be encoded into this array of u8/u16). Based on the last CG meeting, stringref is either not going to happen or not any time soon. However, we could add something stringref-y at the Component Model level in which we lower string values to a reference type (externref initially, later we could eliminate dynamic type checks with type imports) and supply canonical built-ins for operating on these strings (being quite careful to support only basic operations that have the same O(1)/O(n) cost on all host string representations such as sequential code-point iteration or bulk-copy-into-linear-memory and are trivial to implement w/o giant Unicode tables). But (array u8) is probably the right place to start.
Considering the complexity of mutable and some incoming features such as partially mutable, readonly and freeze, it may need to exist as a reference-type parameter.
Taking into account proposals such as thread and share-everything-threading, you can consider implementing this feature in stages.
The initial version only provided immutable types that did not require copying.
Mutability is a post-MVP content, before which users need to sacrifice certain performance to manually implement some glue code to copy to the required types.
Is there a version of the table posted by @oovm above that reflects the current thinking on this? I'm vaguely considering building a shim generator to map from the linear memory ABI to the (future) reference types ABI.
#525 has some updated thinking that addresses some of the issues uncovered here.