wasm-tools icon indicating copy to clipboard operation
wasm-tools copied to clipboard

Initial support for value section parsing

Open primoly opened this issue 1 year ago • 4 comments

This draft adds support for component value sections to wasmparser. https://github.com/WebAssembly/component-model/pull/336

Questions, issues and remarks:

The validator currently does not support validation of the actual values:

The parsing of values depends on the component type space, so a value section can’t be parsed in isolation https://github.com/WebAssembly/component-model/issues/352 How should this be handled? Currently, ComponentValue only parses the type and keeps the bytes of the val. To read the actual value the val method takes a slice of ComponentTypes which must be in the order they were read from the current components ComponentTypeSectionReader.

wasmparser duplicates the types in validator and reader. Currently values.rs uses the ones from reader. Should it use types::Types from validator instead?

Validation only needs to call val, as the parser then checks if the types of values are correct.

Crates depending on wasmparser (such as wasmprinter) need to keep track of the component type space. wasmprinter already keeps some global module/component information in the state variable. wasmparser itself does that in its validator.

Parsing of flags that set labels past 64 to true currently fails. Needs support for arbitrarily large unsigned LEB128.

How should flags be stored in wasmparser? Explainer.md: list of labels of fields set to true. For wasmparser I think this will likely confuse people into thinking the u32s are just bools encoded as integers. So I instead chose Vec<bool>, which would treat it as a special record with only bool fields (it already is). More symmetrical and ergonomic.

Should Record, Flags, Variant and Enum store the label names? They are already provided in the ComponentType. Flags could then also be a Vec<&str>.

BinaryReaderErrors use the offset reader.original_position(). Might not always be correct (off by one or couple bytes).

primoly avatar May 09 '24 13:05 primoly

Thanks for the PR and the work here!

Should it use types::Types from validator instead?

I posted a comment on the spec PR about this, but otherwise for now you'll probably want to take a TypesRef<'_> here to represent that a validator's context is required.

Needs support for arbitrarily large unsigned LEB128.

This might be good to change at the spec-level perhaps too and to spec that beyond 32-bits the value is represented as N 32-bit values so we would decode 32-bit chunks here.

Should Record, Flags, Variant and Enum store the label names?

I'm a bit wary of the current representation of Val for other reasons. It's the "easiest" representation in a sense but it's also one of the less efficient representations. It might make sense to perhaps instead explore a "visitor" like interface where there's a trait ValVisitor where methods are invoked for each parsed piece. For example it'd have something like record_start with record_field and record_end. That way storage of the names would be up to the caller and while Val could be created it wouldn't be the only representation.

alexcrichton avatar May 09 '24 17:05 alexcrichton

Thank you for taking a look so rapidly and the suggestions.

I’ve made the following changes based on your response:

I’m now using types::TypesRef for reading the value. I don’t yet know how that will work in validation tho, because the closure provided to process_component_section can not, I believe, access that value.

I agree that the initial representation for Val isn’t great. I don’t have much experience with the visitor pattern, but I have attempted to implement something along those lines.

primoly avatar May 10 '24 23:05 primoly

Thank you very much for taking another look. One thing I noticed is that the visitor would not work for anything other than validation, because it can’t share mutable state with the child visitor returned by the field/element methods. For instance, I can’t do:

struct Printer {
    output: String,
}

/// ...

impl Record<Self> for &mut Printer {
    fn field(&mut self, name: &str) -> Self {
        self.output.push_str(name);
        self /// <- Does not work
    }
}

I tried to solve this in different attempts by adding lifetimes or changing from owned to borrowed values in both the impl and the trait itself, but every time I hit lifetime or ownership errors somewhere. Unless I miss something obvious, the whole visitor has to be redesigned from scratch, yet I have no idea how. Another solution would be to switch to have val return an iterator instead of taking a visitor but that involves some boxing and I’m not sure about the performance implications. All problems basically arise due to the recursive nature of fields/elements.

primoly avatar May 18 '24 08:05 primoly

The serde crates/traits might be a good inspiration to draw from for this use case? They're solving a pretty similar problem and might be good to model after.

alexcrichton avatar May 20 '24 15:05 alexcrichton