component-model icon indicating copy to clipboard operation
component-model copied to clipboard

Working with GC types without copy

Open oovm opened this issue 1 year ago • 7 comments

I'm having some trouble switching to wasi preview 2.

For example, the following interface:

package wasi:[email protected];
interface random {
    get-random-bytes: func(len: u64) -> list<u8>;
}

The function signature is func (u64) -> (list<u8>)

But its lower type is core func (i64, i32) -> (), which is very difficult to use.

If I want to convert it to core type (array (mut u8)), a very long glue code is required.


I hope to add a GC mode canon option that can make the lower type similar to core func (i64) -> (array u8).

For complex nested types, getting the specified data requires very complex pointer algebra, whereas if using array it only requires multiple array.get.

I think this helps simplify the use of some external interfaces, such as:

package wasi:[email protected];
interface preopens {
    get-directories: func() -> list<tuple<descriptor, string>>;
}

oovm avatar Feb 20 '24 11:02 oovm

Yes, agreed. It's definitely the plan of record to add a gc canonical ABI option, just like you're describing. (It's one of the original motivations for having an IDL that abstracts low-level memory representation, even.) We've mostly been waiting for (1) wasm-gc to be finalized, which it now is and (2) an implementation of wasm-gc to show up in a runtime that also implements components (e.g., one is in progress in Wasmtime). But, if you or anyone else wants to run ahead and create a PR adding the gc option to the proposal (Explainer.md, Binary.md and, mostly significantly, CanonicalABI.md), that would be welcome too.

lukewagner avatar Feb 20 '24 16:02 lukewagner

Before ref-types, gc-types, stringref and other features are stable, we have enough time to discuss how the gc language should obtain wasi data.

In fact, after considering gc types, there is a better correspondence between the wasi type and the wasm type.

No options indicate pointer mode, add reference-type(tentative) to indicate conversion to immutable reference, add mutable-reference(tentative) to indicate internal mutable reference.

Upper Type Lower Type Canonical Options Requisite
u32 i32
tuple<u32, u32> (i32, i32)
tuple<u32, u32> (struct (field i32) (field i32)) reference-type gc
tuple<u32, u32> (struct (field mut i32) (field mut i32)) mutable-reference gc
record {a: u32, b: u32} (flatten layout) (i32, i32)
record {a: u32, b: u32} (struct (field $a i32) (field $b i32)) reference-type gc
list<u8> (i32, i32)
list<u8> (array u8) reference-type gc
list<u8> (array mut u8) mutable-reference gc
string (i32, i32)
string stringref reference-type gc, stringref
string (string.encode_utf8 stringref) reference-type + string-encoding=utf8 gc, stringref
borrow<string> string_view reference-type gc, stringref
resource i32
resource externref reference-type ref-types
flags (flatten layout) (i32 × ⌈flags / 32⌉)
enum i32
option<u32> (ref null i32) / i31ref reference-type gc
option<t> (ref null T) reference-type gc
result<t, e> ? ? ?
variant ? ? ?

variant may be similar to subtype with downcast in gc context.

  • Cross Link: https://github.com/WebAssembly/gc/issues/531

oovm avatar Feb 21 '24 03:02 oovm

Another benefit is that if all gc types are used, there is no need to bring in a memory allocator, which helps reduce the size and warm up faster.

rustc's cabi_export_realloc takes about 27000 lines of wasm instructions(release mode), libc is even larger.

Other smaller allocators sacrifice either speed or security.

(component
    ;; Define a memory allocator
    (core module $MockMemory ;; Replace here by an actual allocator module, such as libc
        (func $realloc (export "realloc") (param i32 i32 i32 i32) (result i32)
            (i32.const 0)
        )
        (memory $memory (export "memory") 255)
    )
    (core instance $mock_memory (instantiate $MockMemory))
    ;; import wasi function
    (import "wasi:random/[email protected]" (instance $wasi:random/[email protected]
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/[email protected]/get-random-bytes (canon lower
        (func $wasi:random/[email protected] "get-random-bytes")
        (memory $mock_memory "memory")
        (realloc (func $mock_memory "realloc"))
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64 i32)))
        (import "wasi:random/[email protected]" "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/[email protected]" (instance (export "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes))))
    ))
)

If using the gc type, this can be simplified to:

(component
    ;; import wasi function
    (import "wasi:random/[email protected]" (instance $wasi:random/[email protected]
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/[email protected]/get-random-bytes (canon lower
        (func $wasi:random/[email protected] "get-random-bytes")
        reference-type
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64) (result (array u8))))
        (import "wasi:random/[email protected]" "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/[email protected]" (instance (export "get-random-bytes" (func $wasi:random/[email protected]/get-random-bytes))))
    ))
)

Obtaining a field of gc type requires only one instruction and does not require pointer algebra (at least three instructions), further reducing the binary size.

oovm avatar Feb 21 '24 04:02 oovm

Yes, really good point regarding mutability vs. immutability; we probably do want both as ABI options. A really nice benefit of immutability is that if both sides of a component-to-component call use immutable GC references, no copy needs to be made when passing a reference across the boundary. OTOH, if your language ultimately does need a mutable array of bytes, then the immutable GC option may impose an extra unnecessary copy; thus having both options make sense.

String its its own story, but definitely a Unicode-encoded (array u8) makes sense (if we treat string-encoding as orthogonal, then all three of utf8, utf16 and latin1+utf16 could be encoded into this array of u8/u16). Based on the last CG meeting, stringref is either not going to happen or not any time soon. However, we could add something stringref-y at the Component Model level in which we lower string values to a reference type (externref initially, later we could eliminate dynamic type checks with type imports) and supply canonical built-ins for operating on these strings (being quite careful to support only basic operations that have the same O(1)/O(n) cost on all host string representations such as sequential code-point iteration or bulk-copy-into-linear-memory and are trivial to implement w/o giant Unicode tables). But (array u8) is probably the right place to start.

lukewagner avatar Feb 21 '24 16:02 lukewagner

Considering the complexity of mutable and some incoming features such as partially mutable, readonly and freeze, it may need to exist as a reference-type parameter.

Taking into account proposals such as thread and share-everything-threading, you can consider implementing this feature in stages.

The initial version only provided immutable types that did not require copying.

Mutability is a post-MVP content, before which users need to sacrifice certain performance to manually implement some glue code to copy to the required types.

oovm avatar Feb 21 '24 17:02 oovm

Is there a version of the table posted by @oovm above that reflects the current thinking on this? I'm vaguely considering building a shim generator to map from the linear memory ABI to the (future) reference types ABI.

gertvv avatar Jun 08 '25 06:06 gertvv

#525 has some updated thinking that addresses some of the issues uncovered here.

lukewagner avatar Jun 09 '25 17:06 lukewagner