wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Replace WASI WITs with binary form

Open alexcrichton opened this issue 1 year ago • 7 comments

This commit replaces the textual *.wit files in the Wasmtime tree with the binary form of a WIT package. The new WIT structure now looks like:

  • crates/wasi/[email protected] - serialized version of the wasi-cli proposal at the v0.2.0 tag. Created with wasm-tools component wit --wasm

  • crates/wasi-http/[email protected] - same as the above, but for wasi-http.

  • crates/test-programs/wit - WIT files used for tests such as a custom reactor.

  • crates/test-programs/wit/deps/*.wasm - WIT bindings to generate tests with. This is a duplicate of the wasi-{cli,http}@0.2.0.wasm files from above but additionally enables testing multiple versions in the future.

The end state of this PR is that whenever WASI APIs are updated we'll need a tagged release to pull into Wasmtime to officially support.

alexcrichton avatar Feb 13 '24 18:02 alexcrichton

What is the advantage of this? It would make it harder to review changes (as git will refuse to show any diff) Also I believe distros generally require the original source for binary blobs to be included anyway and recompiled as part of the build process. https://www.debian.org/doc/debian-policy/ch-source.html#missing-sources-debian-missing-sources and https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#prebuilt-binaries-or-libraries

bjorn3 avatar Feb 13 '24 18:02 bjorn3

Just thought of this - this will eliminate the rust docs generated from the wit comments on the generated code, right? Those are pretty useful and so losing them is a significant regression...

pchickey avatar Feb 13 '24 19:02 pchickey

Good questions! The main benefit of this to me is that it more heavily discourages local modifications to WIT files and reaffirms that the source of truth of the WIT files is upstream in the WASI proposals themeslves, not here in Wasmtime (as has sort of historically ad-hoc-wise been the case but we're trying to shift now). In the limit with https://github.com/bytecodealliance/wit-bindgen/pull/837 I hope to have crates/wasi/[email protected] and crates/wasi-http/[email protected] as the only artifacts in-tree and nothing else, too, to further reduce the need to have CI checks that the two directories are the same.

You're right, however, that reviewing diffs won't be easy and if sources require non-binary artifacts won't work. One possible solution to that would be to check in the wasm text format instead of the wasm binary format as that would still pretty heavily discourage modifications but it would be more easily diff-able and wouldn't be a binary. Still a bit of a "generated wad of text", however, so I'm not sure how distros would feel about that. On the diff side it's also a property that any modifications will almost surely require changes on the Wasmtime side of things, but those are definitely more difficult to review than a change in the WIT itself. I think in general I'm not sure how best we can manage depending on the WASI spec here at this time, but this felt better to me than copying all the WITs in all the places. We're also somewhat "setting the trend" as well where if other projects or embeddings want to reference WASI this is an example of how they don't have to copy WITs but can instead copy artifacts at this time (ideally a package-manager-style-solution would work but that isn't fully mature yet AFAIK).

As for documentation, that should actually be solved where the WASI proposals have a custom section with documentation inside of it so when wit-parser parses everything it has enough info to attach the documentation back to the original item.

alexcrichton avatar Feb 13 '24 21:02 alexcrichton

I've pushed up an update which is now what I think the longer-term vision for WITs-in-Wasmtime should look like. CI will fail until I publish https://github.com/bytecodealliance/wit-bindgen/pull/838 but the idea is that theres a single wasm for wasmtime-wasi and a single wasm for wasmtime-wasi-http, and then we have "stuff" for tests.

(this doesn't move the needle on the binary-vs-not problem, just wanted to point out I made an orthogonal update to that)

alexcrichton avatar Feb 13 '24 23:02 alexcrichton

OK. I'm ok with the docs regression as long as we work on it soon :)

pchickey avatar Feb 14 '24 00:02 pchickey

Oh to clarify, the docs problem I believe is solved. Docs are encoded in the wasm binaries in a custom section, so the only issue I think with this is the binary-vs-not problem

alexcrichton avatar Feb 14 '24 18:02 alexcrichton

I've added this to the next Wasmtime meeting agenda in case a resolution is not met before then.

alexcrichton avatar Feb 15 '24 23:02 alexcrichton

Yeah, @bjorn3 do you have thoughts on *.wasm vs *.wat perhaps? Or thoughts on the problem this is solving of discouraging edits and avoiding duplication?

alexcrichton avatar Feb 20 '24 18:02 alexcrichton

If you want to discourage edits, one option would be to store hashes of them separately and check that they match in a build script. That way you have to modify the source and then separately recompute the hash (or patch out the hash check, but you can leave a big comment near the hash check about why it exists). Or maybe you could gzip the wat files? That would also still allow extracting the original sources, but would still look like a binary blob if you open it in a text editor without extracting them first.

bjorn3 avatar Feb 20 '24 19:02 bjorn3

I had some thoughts I was saving for the Wasmtime agenda item you had scheduled on Feb 29 (are we still discussing it then?) but to add some here:

  • My main thought here is discoverability and beginner-friendliness: wits seem sort of like header files in the sense that it can be useful to refer to them as specifications. If they exist locally only in binary form, that benefit is mostly lost; it can still be dumped via a tool, but it's not as immediately apparent how to do that to a newcomer. That seems more important to me than the "control" aspects you mentioned (the latter which can still be enforced other ways), though admittedly that's a value judgment.

  • Solving the "single source of truth" problem by including only a derived artifact feels a little artificial to me: it's still a copy, data that came from elsewhere. One way I've seen this managed elsewhere (e.g. in large company monorepos) is to have a third_party/ directory with exact copies of external source, and optionally scripts to do updates, and maybe CI rules or at least social conventions to prevent checkin of anything but updates from upstream.

cfallin avatar Feb 20 '24 19:02 cfallin

Ah I put this on the agenda for Feb 29 to make sure it's not forgotten about, but I figure that if it's merged sooner I can take it off the agenda.

I'm not sure how best to thread this needle myself. I don't know how to balance the conerns y'all have with how these WIT files are developed. For example I understand distros don't like binary blobs but a packager can always check out a versioned tag of a WASI proposal/repo and generate the exact same wasm binary. We could check in *.wat or text files but at least personally as a reviewer I'd always skip the files anyway since any changes are going to have larger changes on the bindings/implementation which is what I'd be interested in. I agree that the contents aren't discoverable here in this repo when there's a binary form but to me that's not necessarily a bad thing because this isn't where these WIT files are developed, that's intended for the upstream repos themselves.

To me I feel that the binary distribution of WIT packages is a perfect use case here for integrating with downstream repos using WIT packages. Ideally we wouldn't even check in binary files and we'd simply have Wit.toml which would say [dependencies] 'wasi:cli' = "0.2.0" and nothing else, meaning literally nothing would be checked in to this repository. In that sense the binary files are to me another step towards this future.

As I think more about this one other thing I like about the binary solution is that it makes it much easier to write down and execute instructions about updating these files. Currently it's copying a bunch of WIT files but you have to get that from multiple upstream WASI repositories and make sure you copy them all to the right places with the right names. You also need to make sure all the repos are checked out at the right version. As a reviewer I'd have to theoretically redo all that work to confirm it's all matching. With a binary file there's a single file in a single location under GitHub Releases for a single repo to go pick out. As a reviewer I could verify it's the expected file and move on from there.


Well, in any case, I'm happy to discuss further on this PR, but I'm also ok deferring to next week's meeting too.

alexcrichton avatar Feb 22 '24 19:02 alexcrichton

Discussion from last week's Wasmtime meeting concluded that this isn't the way we'd like to go, and instead pursing https://github.com/WebAssembly/component-model/issues/313 to have a single *.wit file would be better. Additionally we'll want a script to run to update the proposals from a particular tag to ensure no local changes.

alexcrichton avatar Mar 05 '24 17:03 alexcrichton