wasmtime
wasmtime copied to clipboard
Support for Interface Types in wasmtime API
I'd like to open this as a sort of meta-issue about supporting WebAssembly Interface Types in the wasmtime
crate and API. The current support via the crates/interface-types
crate for the purpose of this issue can be basically ignored. It doesn't fit into the wasmtime
crate API at all, it's outdated, and it's not how I think we want the API to look long-term.
The Interface Types proposal has settled a bit more on design lately, where the general idea is that there are a set of adapter types which adapter functions can use. The actual interface of a wasm module will be the adapter functions imported/exported, rather than the core wasm imports/exports.
For some more background, I've got a number of Rust projects supporting the current snapshot of the wasm interface types proposal:
-
wit-parser
- an analogue ofwasmparser
but for the interface types binary format. -
wit-text
- an analogue ofwat
, but for the interface types text format. -
wit-validator
- a crate which performs simple validation of the wasm interface types section of a module, if present -
wasm-bindgen
has now been updated to use these crates and emit wasm interface types modules compatible with the current proposal. This support hasn't been published, though, and I'd like to hold off until there's support inwasmtime
itself.
I've been thinking today about how best to integrate interface types into the wasmtime
set of crates. There's a lot of moving pieces and lot of possibilities for how this can work, but I'd like to make a concrete proposal about how this could be integrated. This is going to be a pretty big change, though, so I'd want to be sure to get buy-in from relevant folks before moving forward.
The general gist of what I'm thinking is:
- The
wasmtime
crate doesn't actually change a whole lot, but theValType
andVal
enumerations are extended with all types from the interface types proposal. This namely means signed/unsigned 8-64 bit integers as well as a string type will show up. - The exports listed by
wasmtime
modules/instances change depending on the presence of the wasm interface types section.- If not present, then everything is the same as it is today
- If present, the core module exports are ignored and the only exported items are the items listed in the wasm interface types section.
- The imports taken during instantiation are a bit tricky. Today they're a list of imports which correspond 1-to-1 with the list of imports reported from a module. I think we'll still want this 1-to-1 matching, but the list of imports reported from a module will be modified as follows:
- If no interface types section is present, everything is the same as it is today.
- If the interface types section is present, then some of the imports of the core module may be satisfied by the adapter functions. To handle this the list of imports required to instantiate the module will first be the set of core wasm imports, in order, that do not have adapters hooked up to them. Then the adapter imports will be listed.
- The C API would receive similar updates. The
wasm_val_t
type would have a few new codes forwasm_valkind_t
and the union would have a few more items packed inside of it. I haven't thought a huge amount about the ownership of strings and such, but I'm assuming that we can think of something pretty reasonable here. - As a minor change, the
wasmtime
executable would use thewit_text
crate to parse input files, which would parse any of*.wat
,*.wasm
, or*.wit
files.
After these changes are implemented then the crates/interface-types
crate would just be deleted, since it is no longer needed. The Python/Rust extensions would be updated to use the wasmtime
crate directly, and would provide all necessary mappings for strings/etc. Additionally the .NET extension could be updated with more support since this would all be present in the C API.
Some questions I've wanted to make sure we've got covered when landing all this are:
How will future interface types changes affect this design?
Given this setup the only public-facing change to the design of interface types will be the set of supported types/values. Otherwise I think everything else can be internal changes (yay!) which we take care of one way or another.
How will adapters get implemented today?
I'm thinking that we'll basically just have an "interpreter" for all adapter instructions. Eventually we can get fancier with more JIT-like support but I'd like to get the bare-bones up and running first.
What about the laziness of types?
One of the ideas in the interface types proposal is that adapter instructions are often lazy to minimize copying data between various locations. This'll all boil down to the Val::String
representation I think. For now I'm imagaining something like Val::String
holding a memory reference, pointer, and length. That way when requested you can read off the entire string, or if you want you could get raw access and peek at the bytes yourself. I'm not really 100% sure how this will work though, but I'm also assuming that it's ok to "cop out" in the interim and "just use String
" to see how far it gets us.
This is something I'm particularly curious to hear feedback on though. Especially as more and more types come into the interface types proposal, this is going to get more and more important. For example it'd be pretty awesome if we could start now defining wasi APIs in terms of interface types, but this is all likely a good ways off so perhaps a bit premature to worry about this.
What about linking modules together?
I'm not even gonna try to approach this in a first pass. The main interesting thing here is that if you link together two wasm modules using wasm interface types you can ideally create a highly-optimized adapter function in the middle of the two. I don't plan to tackle this in our initial support though, and while this should work one way or another, it'll likely be relatively slow and copy/allocation-heavy.
Is wasmtime
the right crate to put this in wrt stability?
Interface types and its proposal are pretty unstable, with lots going to change now and in the future. Is it ok to now hook up the interface types proposal into the public API of wasmtime
? That will be a continued source of instability for consumers of the API (new type variants basically). I'm hoping that new types don't come along all that often, and I'm also hoping that stability isn't necessarily paramount in the wasmtime
crate yet that this is ok.
I don't really have many other alternatives though of how to integrate interface types into the wasmtime project. So if others think that interface types are too unstable to go into the wasmtime
crate, now would be a good time to discuss that, as well as possible mitigations :)
Oh I should also mention that a bare minimum I'm expecting of any future support of interface types in wasmtime is to actually have tests. To that end the wit-*
family of crates and the wasm-interface-types
project should make it possible to do this. I'm envisioning tests like:
- A directory of
*.wit
files that can all be instantiated. - A suite of tests that instantiate
*.wit
files and the poke at the module to test behavior. - Tests about errors when using
wasmtime
and mismatching types and such. - Tests for the Python/Rust extensions which use those projects to consume
*.wit
files and then poke at the result to make sure codegen and interfaces all work out. I'm assuming the same strategy will work for .NET, although I've never written a line of C# in my life :)
Notably I do not want to hook up wasm-bindgen
into the test suite of wasmtime
. The wasm-bindgen
CLI has its own dependencies on wit-*
tools which we can't easily sync with so having these decoupled should allow us to still get all the coverage we need without spending all our time aligning dependencies.
Would the API instability affect users of the wasmtime
crate that aren't using interface types? If not, then I think it's natural to put it there. We're still in 0.x in semver,so as long as people trying interface types stuff know that the interface types parts are still in flux, that seems fine.
I don't know what all the challenges will be, but with that said, my instinct is that making Val::String
lazy now rather than later will help us flush out design issues because it'll require us to think about the lifetime of the data, which will only get more important when we add Val::ByteSequence or so in the future. But I wouldn't make that a hard requirement right now.
On the topic of tests, For wasi-common's tests I'm considering an approach where we check in the compiled wasm files (or maybe the corresponding wat files) so that we can run the tests without having all the tools installed, but then have a way for people who do have the tools installed to use them and/or regenerate the compiled files. Would that work for the wasm-bindgen parts?
I think this is a good plan! In particular, I agree with the crate organization, and in particular putting this into the wasmtime
crate. Additionally, I think the way to do this incrementally seems good, too.
As a minor change, the wasmtime executable would use the wit_text crate to parse input files, which would parse any of *.wat, *.wasm, or *.wit files.
How settled is it for ".wit" to mean ".wat + interface types"? If it's not settled yet, one option would be to just continue to use ".wat" for files that contain interface types annotations, because the annotation syntax is on track to become part of the .wat syntax. That would also avoid the ambiguity with ".wit" meaning instance type, which happens to be the "wit" that "witx" is named after.
Would the API instability affect users of the wasmtime crate that aren't using interface types?
It depends on what the crate is doing, and it also sort of depends on the idioms we expect users to have. The breakage would be of the form:
let x: wasmtime::ValType = ...;
match x {
// we'll add variants to this over time, so exhaustive matches will break
}
let y: wasmtime::Val = ...;
match y {
// we'll add variants to this over time, so exhaustive matches will break
}
We could encourage users to user helper methods though (perhaps like val.i32() -> Option<i32>
) instead of using an exhaustive match
to mitigate the issue. That's the only breakage that I can think of though, and we could also possible fix this by making the enumeration #[non_exhaustive]
to force everyone to handle the case of "eventually there wil be some type I don't support here"
my instinct is that making Val::String lazy now rather than later will help us flush out design issues
I think I agree with this, so I think that an initial implementation should push really hard on fleshing out the laziness story. Only if something truly and fundamentally is incompatible do I think it should be postponed for later.
Would that work for the wasm-bindgen parts?
Heh I would personally still not be very comfortable checking in *.wasm
files to the repository myself, but yeah that would solve the issues here. We could pretty easily check in wasm-bindgen generated *.wasm
files and have the sources in the repo as well in case anyone wanted to regenerate them.
The downside of this approach would be, though:
- If we update the wasm-interface-types support, then we'd have to always update
wasm-bindgen
first and then the update in this repo would be coupled with a bunch of binary blob updates. - We couldn't update wasm-interface-types support here without regenerating or ignoring those files temporarily.
Overall I don't actually think we'll need to use wasm-bindgen
-generated files. They would be a good stress/integration test to make sure everything fully works, but using raw text files in general I think should be sufficient for the testing that I'm thinking of.
How settled is it for ".wit" to mean ".wat + interface types"?
Not settled in the least. I randomly made a decision to use *.wit
files, and I am now starting to regret it :)
I think I like the idea of *.wat
being the extension for text files, and then if it just happens to have @interface
annotations then it just happens to have interface types support.
In general though none of this really matters internally in wasmtime
, I was never thinking of switching on file extensions, it's just a way of organizing things so far. For example I've used the extension to write tests, but that's basically it.
The general idea is that the wasmtime
command line would be able to take in the textual representation of wasm interface types, just like it can take in the textual representation of wasm files, and work with them. If we still call that *.wat
I think that'll work out great.
To elaborate on the wasm-bindgen thing and *.wasm
binary files, the main issue is actually this binary encoding of the interface types section. That's inevitably going to change a lot over time, and for wasm-bindgen
-produced files to work in wasmtime
both wasmtime
and wasm-bindgen
would have to agree on encoding, which isn't an easy thing to do unfortunately in one PR or one commit.
We could encourage users to user helper methods though (perhaps like
val.i32() -> Option<i32>
) instead of using an exhaustivematch
to mitigate the issue.
This seems like a very good thing to have and encourage the use of regardless: my intuition is that in the majority of cases you know the type contained in the Val.
My thought is that adding enum values is pretty mild in terms of API breakage; the error messages are clear, and the fixes if you don't care about the new values are straightforward. I don't think non_exhaustive
is needed at this time.
Great write up! I think this makes a lot of sense. Here's some thoughts I have regarding interface types and the impact on the C embedding API.
In addition to expanding wasm_val_t
, I think we'd need to introduce a wasm_string_t
type so that we can keep the wasm_val_t
union word-sized. I recall there being push-back on adding support for 128-bit SIMD types into the union for that reason. I assume wasm_string_t
would be a struct containing data pointer and length (in bytes).
It would be really nice if the embedding API supported a concept of a "host string encoding" enumeration that could be set when creating the engine. The data of wasm_string_t
would always be in "host encoding" so we don't have to mandate one like UTF-8. We'd probably only support host encodings of UTF-8 and UTF-16 to start.
More importantly, this would control the source encoding for the string-to-memory
(destination is always UTF-8 per proposal) and destination encoding for memory-to-string
(source is always UTF-8 per proposal) in Wasmtime. This has the potential to eliminate an intermediary allocation for hosts that don't represent strings as UTF-8 to use wasm_string_t
.
For example, strings are stored as UTF-16 in .NET. I would really like to avoid having to allocate a UTF-8 version of a string just to pass it to wasm_func_call
. What I'd really like to do is pass a pinned pointer to the UTF-16 string data directly into Wasmtime and have it directly encode it as UTF-8 into the memory for a string-to-memory
interface instruction.
Oh interesting, @peterhuene do you have links to that discussion about wasm_val_t
? The current wasm_val_t
I think is 128 bits already on 64-bit and 12 bytes on 32-bit already because of the leading byte discriminant followed by a payload which could include a 64-bit number. Adding a 16-byte payload seems like it wouldn't really be the end of the world since that'd just make it 20 instead of 16 bytes on both x86_64 and x86.
That's a good point about encodings, although I would initally sort of hope that we could preserve "utf-8 everywhere" as long as possible, but not with copies. A wasm_string_t
could perhaps literally be just a 32-bit pointer/length, making it equivalent to the 64-bit integer payload today. You'd then require accessing it with a memory/store elsewhere, so when flowing out to the C API it'd never be copied from its original location. That way for .NET and UTF-16 I think the bytes would still only need to be transcoded once?
It was this comment that reminded me of the push-back against adding a 128-bit type to the union, although I incorrectly recalled the intention to keep it word-sized (it's 64-bit currently, so only word-sized for 64-bit hosts). I was more talking about the union and not wasm_val_t
itself, which is a tagged union and thus larger than the union.
Could you walk me through how you envision wasm_string_t
to contain a 32-bit pointer, especially for 64-bit hosts? We're not talking about this being a "pointer" into a WebAssembly memory, right? I thought the intention of exposing interface types via the API would be such that the host would not have to interface with a memory at all to pass and receive strings. In fact, if the adapter instructions include a call to free the source UTF-8 string after a memory-to-string
instruction, then we can't even give an address into a memory back via this interface.
I might be missing a huge thing here, but if we keep the C API UTF-8 only, I don't see a way around a UTF-16 native host from having to allocate a temporary string to the give the API and then get it copied into a memory from the interface types implementation.
Allowing host Unicode encodings does sound like it fits really well with the contents being lazy -- transcoding can happen at the point where we would otherwise copy, so that we never transcode into a temporary buffer.
Thanks for the pointer! I'm somewhat hesitant to try to tackle the C API given that though. If the proposed C API doesn't want to tackle v128
because it's too experimental, wasm interface types likely won't be stomached well either... It's probably best anyway to get something implemented first and then add on the C layering afterwards anyway.
I was thinking basically what you think I'm thinking, where a wasm_string_t
would contain two u32
, one as a pointer relative to the base of the wasm instance's memory and another is the length. For the use-after-free I was assuming that wasm is passing a string to an import and using a deferred call to free the data, so for the duration of the invocation of the import the string is valid. While it wouldn't involve literally exporting the memory of the module it would at a technical level require someone to have access to the memory. Honestly though I haven't thought a huge amount about this, and this whole area seems hugely fraught with lots of difficult questions. Encoding is another along these lines.
I'm sort of hoping though that we can avoid getting hung up on too many details and get something bare-bones working. It surely can't be worse than what we have today, and ideally an implementation to iterate on would help us figure out better ways to implement it as well as perhaps ways to iterate the spec itself.
Let me ask a simple question then: if I am a host that needs to pass a Wasm-memory-address and length pair to an interface types definition of a function that takes a "string", how do I do that?
I can't allocate anything in the Wasm memory to give you; that requires an implementation of interface types in the host and not the runtime.
Yeah that's where my thinking breaks down really quickly :)
I haven't really fully thought through all the implications of everything. The absolute easiest thing to do is to require everything to be a bare and owned String
. Trying to get fancier than that seems like it's going to be extremely involved, and while I've got a few half baked ideas I'm not proposing anything for certain yet. (aka you're right that the u32/u32 pair wouldn't account for host-provided strings, only for wasm-to-host strings)
I'm also saying you can't return an address into the memory for wasm-to-host, either. The adapter may free the string in the WebAssembly memory and the implementation of the free function may zero the memory or something else. It has to give back at least a copy of the data out of the WebAssembly memory for the memory-to-string
instruction.
I don't want to get too caught up in this discussion such that it hinders progress on moving forward on an implementation. To wit, I'm fine with keeping the initial implementation of strings coming in and out of the host (including the Rust API) as UTF-8 provided we look forward to an implementation that would prevent hosts with different native string representations from having to create temporary allocations to make string types work (i.e. perhaps the idea of a host encoding or per-string encoding in the C API, with only support for UTF-8 initially).
The API (C++ or Rust) has major deficiency, is late-binding which causing us extend the wasm_val_t with custom types. Each time we are trying to use API's call()
method, it uses trampolines to store and pass data. There shall be a better way, than to use call()
and doing argument transformation. I was imagining that interface types will assist with creation of "smart" trampolines where there will no be a need to iterate on or create [Val]
for a call. At least for statically typed languages. (P.S. dynamic languages have internal way for polymorphism, can meet us half-way e.g. by providing serialized params blobs for universal trampolines)
Is this an appropriate issue in which to discuss using Interface Types for bindings to host functions, as well? For instance, it should be possible to export a Rust function directly to WebAssembly without writing a manual wrapper using Callable
to accept and return slices of Val
. That would make embedding wasmtime and supplying host functions to call much easier.
(Eventually, that will need to include support for opaque reference types.)
I think that falls a bit under what @yurydelendik was mentioning moreso than interface types. You should already in theory be able to do that for integers/floats and with interface types you should just also be able to do that with interface types. I think there's basically an underlying feature request of creating an Extern
with a statically typed function that has less "fluff" about slices and whatnot, but that's a separate issue from this (I think)
@alexcrichton I'm not sure how it fits into the current plans (per #1271) but I would like to volunteer to continue the work you started with https://github.com/bytecodealliance/wasmtime/pull/1013.
I spent some time going through the source to get an idea of what is involved and have already started adding some of the missing instructions (from instr.md) in a local branch (not published yet).
I might need guidance at various points (around strings, compilation, etc.) but I have time to focus on it and I've been looking for something more substantial to work on.
@darinmorrison that'd be awesome, and I'd be happy to help!
I think the TODO list on that PR covers pretty well the major two remaining bits, and I'd probably recommend tackling the first two earlier rather than later to ensure that all the various integration points are largely working before an interpreter becomes more fleshed out.
Feel free to ping me on zulip or here if you've got questions though!
@alexcrichton , based on the last comment here https://github.com/bytecodealliance/wasmtime/pull/1289 it looks like @darinmorrison is not able to continue on this work.
I can't even begin to fathom how all this works, nor do I have the capacity right now to work on things. I'm justing pinging it here so that the "interface types" work doesn't disappear. Its important :D
That's a good point about encodings, although I would initally sort of hope that we could preserve "utf-8 everywhere" as long as possible, but not with copies.
I don't think that interface types are the place to push UTF-8. Ultimately it's up to the languages, not the bytecode. No one will choose which encoding to used based on what interface types do, they'll choose based on what the languages involved do. Supporting more Unicode encodings means removing redundant copies, it's as simple as that. It won't lead to some dystopia where everyone is switching to UTF-16 because it wasn't barred up enough.
I don't think that interface types are the place to push UTF-8
Note that when it comes to Interface Types, we'll implement what gets standardized, so if the Interface Types proposal ends up only supporting UTF-8, then we'll only implement that. I.e., arguments for additional encoding should be made to the WebAssembly CG, not here.
Thanks, I left a comment on their repository saying the same thing.
Sorry for the uninformed question, but how can I enable interface types at the moment? Do I need the wasm-interface-types crate? Or is it a total no-go atm?
Unfortuantely at this time there's not a prototype or a switch to turn on. This is not implemented, so testing this out (or even playing around with it) will have to wait until someone implements this.
This is not implemented, so testing this out (or even playing around with it) will have to wait until someone implements this.
@alexcrichton Just out of curiosity, what is needs to be implemented for interface types to be supported? Is there a set of issues / roadmap / milestone we can refer to? Or is just a matter of reading through the spec and implementing the appropriate features in accordance with wasmtime's API?
I'm not aware of a milestone as-is at this time. The main blocker is still planning work to happen on this. There's a number of moving parts, though, especially in that the specification itself is not stable and is a moving target. This will likely be an effort that's much larger than simply implementing the current state of the specification in wasmtime itself.
Is it possible to use some temporary workaround? For example witx? (especially after #2659 has been merged) I want to experiment with the implementation of the plugin system using wasm, but inventing your own ABI and manipulating memory and pointers every time you want to pass a struct to a function is kinda hell 😄