wabt
wabt copied to clipboard
Custom sections wasm2wat / wat2wasm
We are researching ways to support custom sections in the text file format (wat) so we can transform it easily to wasm and viceversa.
One of the ideas we came up with is to automatically base64 encode the custom section when decoding from a wasm file, and automatically base64 decode when passing from wat to wasm.
Here's the proposal:
wasm2wat
When doing wasm2wat, we can base64 encode the contents of each of the binary sections.
Proposal for wat:
(module
(section sectionName "base64:aGVsbG8=") ;; this is the base64 encoding of "hello"
(func (export "addTwo") (param i32 i32) (result i32)
local.get 0
local.get 1
i32.add))
wat2wasm
When encoding the text format to wasm, we could just read the section value, see if it's using an encoding we support, and if that's the case we can just transform it into its binary format and append the section binary content to the wasm.
Are there other ways we could accomplish it?
Thoughts on this proposal?
This is one of the motivating use cases for the annotations proposal. In particular, this gist goes into some detail about how this could be supported.
Thanks for the prompt response, that makes sense.
Do you think we could support the following via annotations in wabt?
$ wasm2wat wasmwithcustomsections.wasm > wasmwithcustomsections.wat
wasmwithcustomsections.wat will have the following contents:
(module
(@section "sectionName" "base64:aGVsbG8=") ;; this is the base64 encoding of "hello"
(func (export "addTwo") (param i32 i32) (result i32)
local.get 0
local.get 1
i32.add))
$ wat2wasm wasmwithcustomsections.wat -o wasmwithcustomsections.wasm
wasmwithcustomsections.wasm will have the "sectionName" custom section with the content "hello"
I guess I don't see why this is preferable to the syntax proposed in the gist above?
(module
(@custom "sectionName" (after function) "hello")
...
)
the wat format already has a way to specify non-printable bytes, if necessary: "\xx" where "x" is a hex digit.
Oh, somehow misread the gist. The syntax makes sense.
Regarding adding custom sections support with the syntax you commented, would you be happy if we help with a PR for that? (perhaps just starting with the following since it's a bit simpler?):
(module
(@custom "stuff" "at the end")
)
Suggestions
One small suggestion I have is renaming @custom to @customSection (or @custom-section). I know is a bit longer to type, but it gives more context about what it does.
When I initially read @custom I thought it was referring to a custom annotation in general, not a custom section, if that makes sense.
Optional encoding support?
I think adding support for custom encodings (like base64) might keep the binary a bit smaller in the wat. But we can start simple and think of this use case after, if you think that might be useful.
Regarding adding custom sections support with the syntax you commented, would you be happy if we help with a PR for that?
Sure, that would be great!
(perhaps just starting with the following since it's a bit simpler?)
Seems like a good start, though we'll want to add support for positioning the custom sections soon after since without it the binary won't roundtrip properly.
One small suggestion I have is renaming @custom to @customSection
Sounds OK to me, but maybe better to discuss on the annotations repo first.
Looks like there wasn't a lot of activity on this issue lately, is there any problem holding this back, or are we simply waiting for an implementer?
@MendyBerger I think just waiting on someone to implement, yes. Feel free to send a PR!
I've started to look into doing the wasm2wat part, still not sure that I'll be able to do it but I'll try.
Just to make sure that I understand correctly, if there's a custom section in the wasm binary named "foo" with the following bytes: [5, 30, 65] the output would be:
(@custom "foo" "\05\1eA")
(5 and 30 are non printable so they're printed as \hex, and 65 is Unicode 'A').
Did I get it right?
Take a look at the document here: https://github.com/WebAssembly/annotations/blob/main/proposals/annotations/Overview.md#details This seems to be up-to-date as far as I can tell. It seems that there has been some discussion about how to test this recently, so you may want to follow along with that too: see https://github.com/WebAssembly/annotations/issues/15 and https://github.com/WebAssembly/design/issues/1445
I'm interested in working on this. Is the annotations syntax still the best way of doing this?
I have a basic implementation for extending BinaryReaderIR to support reading custom sections. As of right now, I'm more concerned with how to actually represent the custom sections in the Module. I could extend the ModuleField type, or go with something much simpler (like a vector of some Custom struct that simply holds the section name and data).
Also, since a custom section annotation has the "location" part, I assume the logic for getting this location should take place in the WatWriter, not BinaryReaderIR, because as far as I know, this is a detail of the WAT format, not WASM as a whole.
I'm interested in working on this. Is the annotations syntax still the best way of doing this?
Awesome! And yes, I think so.
I have a basic implementation for extending
BinaryReaderIRto support reading custom sections. As of right now, I'm more concerned with how to actually represent the custom sections in theModule. I could extend theModuleFieldtype, or go with something much simpler (like a vector of someCustomstruct that simply holds the section name and data).
Hmm, the latter seems maybe a bit more explicit and cleaner?
BTW: How do you plan to deal with the custom sections that the BinaryReader already "handles" (name, dylink0, dylink, reloc, target_features, linking, and code_metadata)?
Also, since a custom section annotation has the "location" part, I assume the logic for getting this location should take place in the
WatWriter, notBinaryReaderIR, because as far as I know, this is a detail of the WAT format, not WASM as a whole.
I believe BinaryReader does fill in the Location too; WABT has both "Text" and "Binary" locations depending on how the module was parsed (see https://github.com/WebAssembly/wabt/blob/main/include/wabt/common.h#L200).
Hmm, the latter seems maybe a bit more explicit and cleaner?
I agree. I think I'll end up doing it this way, and if I run into problems, I'll change the implementation. One thing to note is this for-loop, which WatWriter uses in one of its top-level methods. If we were to put custom section stuff in their own vector, we'd lose consistent section ordering with the original WASM module (as I'd have to write the custom sections at the beginning/end of the module). I'm not sure if this is a worry, though, as it doesn't seem to be the case with the of similar wabt design decisions (with how custom name sections are handled, etc.).
BTW: How do you plan to deal with the custom sections that the BinaryReader already "handles" (name, dylink0, dylink, reloc, target_features, linking, and code_metadata)?
Hmm, these pose a bit of a problem. One option is to leave those as is, and just parse the unknown custom section if it is not one of the above. This is similar to what walrus does. However, that means I'd have to reconstruct those custom sections by hand back into their raw bytes form when writing them from WatWriter. The alternative is to leave the existing parsing they already do, but also include them in the more generic custom section parsing. This would mean they could potentially be stored in two places. I'm also not sure if this is even possible, as I'm assuming BinaryReader doesn't have some sort of backtracking mechanism to parse two things at once (for example, both the custom name section and also the name section in the generic bytes form). edit: Upon further inspection, it looks like I could do something with state_.offset to essentially backtrack the parser.
On the subject of locations, the Location type doesn't seem to store the same data that the custom section annotation grammar specifies. In the design document, the location part would look something like: (after function) or (after code), not absolute positioning. If we were to put custom sections in their own vector, this wouldn't even be a worry, as the grammar specifies that we could elide these locations at the end of the module. This is definitely another upside that the vector of custom sections implementation.
Can this be closed now that the PR has been merged?