wasm-tools `wasmparser`: Reuse allocations of the `FuncValidator`

Current State & Problem

Currently users of wasmparser have to instantiate a FuncValidator for every Wasm function in a Wasm module that they want to validate. For larger Wasm modules such as spidermonkey.wasm the number of Wasm function is equal to 11518. FuncValidator itself build on top of OperatorValidator which houses multiple Vec-based buffers that may be resized multiple times for each of those functions.

Proposal

What we could do instead is to make it possible to reuse a single FuncValidator for all of those 11518 Wasm functions and thereby avoiding re-allocating heap memory over and over. In wasmi we did something similar with the wasmi translation engine and got pretty good results from it.

Requirements

This proposal requires a slight redesign of the FuncValidator API. The new requirements are:

The underlying state of the FuncValidator (namely OperatorValidator) or the FuncValidator itself must be constructible without pre-assigning it with a Wasm function. fn new(resources: impl WasmModuleResources, features: WasmFeatures) -> Self could be a potential constructor.
The FuncValidator then requires an API to turn it into "assigned" state where it is actually assigned to a function for validation purposes. fn func(&mut self, index: u32, ty: u32, offset: usize) -> Result<()>;
An assigned FuncValidator just operates the same way a FuncValidator works nowadays.
The finish method on FuncValidator implicitly unassigns the FuncValidator again to allow it to be used for the next function.

Error Handling

This API should return an Err if an unassigned FuncValidator is used as if it was assigned or if an assigned FuncValidator is reassigned before finishing validation of the currently assigned function. The above API proposal could also be formulated into a more strongly typed implementation where it is impossible to enter the aforementioned state transitions guarded by the type system. This has to be explored.

Parallel Validation

For parallel validation of Wasm functions another requirement to the new API is that it should ideally be possible to create pools of reusable FuncValidators.

Aug 14 '22 13:08 Robbepop

I think the best way to solve this would probably be some sort of opaque struct FuncValidatorAllocations { ... } which can be extracted and inserted into a validator. That way a validation thread could have a single set of the allocations which it uses to create the validator and then extracts from the validator when finished. This gets a little tricky with how function validators are created by the main validator and things like locals are allocated on creation of the validator. One possibility there would be to return state which can be used to create a function validator from the main validator instead of the function validator itself, deferring pairing with the allocation resource until it's time to run the actual validator.

Aug 15 '22 14:08 alexcrichton

As a reference point the wasmi PR that introduces reusable allocations provided roughly 10% performance boost: https://github.com/paritytech/wasmi/pull/411

Aug 19 '22 10:08 Robbepop

I think the best way to solve this would probably be some sort of opaque struct FuncValidatorAllocations { ... } which can be extracted and inserted into a validator. That way a validation thread could have a single set of the allocations which it uses to create the validator and then extracts from the validator when finished. This gets a little tricky with how function validators are created by the main validator and things like locals are allocated on creation of the validator. One possibility there would be to return state which can be used to create a function validator from the main validator instead of the function validator itself, deferring pairing with the allocation resource until it's time to run the actual validator.

How about just feeding a &mut FuncValidatorAllocations instead of taking ownership? That will cause an indirection but it will be less painful for the API. This is what we did in wasmi and while it is not perfect compared to the ownership-taking variant we still managed to receive a 10% win.

I will experiment with both versions and look what works better.

Aug 19 '22 18:08 Robbepop

Sure yeah that also seems reasonable to me. Wasmtime has a perhaps different use case than what wasmi may be exercising which is what I'm trying to handle which is that validation of the module produces a FuncValidator-per-function which is then, concurrently on many threads, validated afterwards. In that sense the time of creation of the FuncValidator isn't connected to where the allocations would be stored, which is instead some sort of thread-local data structure or something like that during the parallel validation.

That being said I'm not wed to any particular shape of API so long as it fits the constraints, I'm happy to defer.

Aug 19 '22 18:08 alexcrichton

Sure yeah that also seems reasonable to me. Wasmtime has a perhaps different use case than what wasmi may be exercising which is what I'm trying to handle which is that validation of the module produces a FuncValidator-per-function which is then, concurrently on many threads, validated afterwards. In that sense the time of creation of the FuncValidator isn't connected to where the allocations would be stored, which is instead some sort of thread-local data structure or something like that during the parallel validation.

That being said I'm not wed to any particular shape of API so long as it fits the constraints, I'm happy to defer.

Is the compilation of Wasm function lazy in Wasmtime? Like, Wasmtime only validates AND/OR compiles a function if and only if it is actually used? That's something we are exploring with a newer engine for wasmi that uses register machine based bytecode with a heavier compilation phase. Obviously this unfortunately adds slight overhead upon calling functions.

Aug 19 '22 18:08 Robbepop

Nah it's just deferred to later. The module's "spine" is first validated which produces type information and a bunch of functions to validate, and then later in the compilation process all those functions are, in parallel using rayon, compiled to machine code.

Aug 19 '22 18:08 alexcrichton

Wasmtime has a perhaps different use case than what wasmi may be exercising

Yeah definitely, in wasmi we cannot really use multi-threading since wasmi is primarily executed as no_std Wasm blob that has no multi-threading. So we have to make single threaded compilation as efficient as possible.

However, I am sure we can find an efficient reusable API that profits both, Wasmtime and wasmi use cases. :)

Aug 19 '22 20:08 Robbepop