wasm-c-api
wasm-c-api copied to clipboard
More consistent ownership
Original comments by @alexcrichton in https://github.com/WebAssembly/wasm-c-api/pull/145#issuecomment-652444373 :
I think in terms of ownership the API right now is pretty inconsistent, and I think that @sunfishcode's suggestion could greatly improve the consistency and understandability of the API at-a-glance. For example I think it would be more understandable if vectors were only used as either return values or "here's ownership of an allocated chunk of memory". Currently vectors are sometimes used as ownership containers and sometimes used as "here's a temporary view into my data". Additionally the ownership itself is pretty:
wasm_functype_new
- takes ownership of its parameter/results listswasm_functype_{params,results}
- returns a vector, but not an owned vectorwasm_trap_trace
- returns an owned vectorwasm_module_{new,validate}
- takes a vector, but does not take ownershipwasm_module_{imports,exports}
- returns an owned vectorwasm_instance_exports
- returns an owned vectorwasm_trap_new
- vector argument, does not take ownershipwasm_trap_message
- returns a vectorwasm_{import,export}type_new
- takes ownership of its module/name argumentswasm_importtype_module
- returns a vector, does not yield ownershipwasm_{import,export}type_name
- returns a vector, does not yield ownershipGiven this state the only real way to figure out the ownership of an API seems to be to look at the documentation, from the signature alone it's pretty unclear where ownership is being transferred. For example it seems like a weird gotcha that
wasm_trap_new
doesn't take ownership of its string argument whilewasm_importtype_new
does.I think it could be more consistent to say "using a vector means transfer of ownership". That way if you ever see a
*_vec_t
(or alias withwasm_name_t
orwasm_message_t
) in the API you know that an ownership transfer is happening. Otherwise if you see a pointer/length then the standard C idiom of "don't manage the memory you don't own" kicks into play and it's clear you don't need to deallocate it, nor is it deallocated for you if you pass it to an API.
And https://github.com/WebAssembly/wasm-c-api/pull/145#issuecomment-652539703 :
For example why does
wasm_module_imports
return an owned vector butwasm_functype_params
returns a vector owned by the original type? Additionally why doeswasm_importtype_new
take ownership of its string arguments whilewasm_trap_new
not take ownership? (there's also a question of whywasm_trap_*
works with nul-terminated strings but that's orthogonal to ownership)I agree that APIs sometimes need to return two values, such as
wasm_importtype_module
. I think this can be done like inwasm_instance_new
, however, where one return value is "blessed" to be returned from the function and another is returned as an out-parameter. My point is that idiomatically and conventionally I think it would make more sense for usage of*_vec_t
to imply "ownership is being transferred", while if ownership isn't being transferred a plain old pointer/length is used (either as two parameters or one return and one out-parameter).An example of this is the libgit2 API where
git_buf
is used similarly towasm_byte_vec_t
but only when a whole vector is returned. All other APIs that take a list of values take a plain pointer/length pair. This also shows an alternative to returning lists of values as returning iterators of values. For example in Wasmtime to implementwasm_module_exports
we don't natively store exports in as a list ofwasm_extern_t
values, so an iterator would be much more natural there anyway.
For example why does wasm_module_imports return an owned vector but wasm_functype_params returns a vector owned by the original type?
Generally, the driving factors for returning ownership were whether the accessed information will still be needed by the callee, and whether it will typically be accessed repeatedly.
For type structures, the assumption is that they all are transparent, traversable, (usually) tree-shaped type descriptions with a canonical parent-owns-children policy. So constructors all take ownership and accessor to such a structure are meant for efficient reading/traversal. Passing back ownership would require costly copying subgraphs for every read node, where a vector is just a particular kind of a node. That would imply a quadratic amount copying when traversing a deeper tree. Or a complicated internal ownership regime. (Mind you, we don't have much type structure yet, but more will come and the design should be future-proof.)
Something like module_imports
OTOH conceptually creates a type structure. Depending on the implementation details of an engine, this structure may be pre-computed and cached or not. So it seemed appropriate to treat it as generative and return ownership. But I'd curious to hear from engines whether they'd be fine with having to keep this information alive with the module.
It is true that we could use iterators to access vectors, but they require additional state and are more complex and costly than just returning a pointer, when the information is stored in array form anyway. Another option would be to have indexed accessors -- that's simpler but probably still more costly.
It also seems easier (and possibly requires fewer allocations) for higher-level bindings if input/output is handled symmetrically, i.e., uses a uniform abstraction for representing vectors of things and use both ways -- they can just wrap this abstraction. But that is just a conjecture.
Following up on this to say that we're experiencing a similar issue on two fronts:
- for implementation, some of the ownership semantics are confusing
- for direct users of the Wasm C API that we've talked to, they're often struggling to use the API correctly in regards to ownership
I'd like to add that the comment in wasm.h about how own
applies to vec
is either misleading or incorrect with the current state of the world https://github.com/WebAssembly/wasm-c-api/blob/master/include/wasm.h#L51
// - `own wasm_xxx_vec_t` owns the vector as well as its elements(!)
wasm_xxx_vec_t
s are not passed by value anywhere in wasm.h
as far as I can see, so it's possible that this statement is true but irrelevant, but if we assume that it's applying to own wasm_xxx_vec_t *
then it's not correct: it seems that everywhere own wasm_xxx_vec_t *
is used as a parameter the ownership of the vec data (ptr & len) are not transferred. It's only when own wasm_xxx_vec_t *
is returned does the vec data's ownership transfer.
Speaking for myself, I see the Wasm C API as a very generic API designed to work on all different types of Wasm VMs, so the right trade off in my mind is to always prioritize universality over performance. Though usability is also an important consideration. For example, interacting with either imports or exports by name is a pretty complicated process in the current Wasm C API that requires building up your own data structures and organizing the data yourself when the same thing could be done more efficiently and directly by most hosts by providing a higher level function in the C API. I think this is a fine tradeoff to make, having a universal API could be really valuable. But it does lead me to believe that a C library built on top of the Wasm C API that provides a higher level interface would be the right choice for most users. Those higher level features could always be added into wasm.h too, as is already done with things like wasm_name_new_from_string
.
On the other hand, being too slow could lead to more fragmentation. But most use cases I've seen personally are completely dominated by the Wasm compilation and execution time. As long as the critical paths (i.e. calling functions, loading cached modules) aren't too slow, I think it's actually not important if the rest of the C API is slow or inefficient. But it does matter if it's easy to implement and use correctly (and also easy for end-users to build things with).