Defining an implementation limit on initializer expression size
An initializer expression is very similar to a function body just with a much more restricted set of allowed instructions.
For function bodys we have a limit of 7,654,321 bytes each. [spec]
I'd argue that we should therefore also introduce at least the same limit to constant initializer expressions (or a smaller limit)?
Given the generous limit this shouldn't be breaking anybody (wasm-gc is probably right now the only feature where initializer expressions could get somewhat large if e.g. a global defines an object with deeply nested objects and arrays).
Context: Our init-expression fuzzer in V8 generated some deeply nested structs with non-nullable references and then decided to generate an ~8MB initializer expression to populate a struct of that type. The fuzzer compares the result against running a function with the same body as the initializer expression expecting that we produce an equivalent object but ran into this limit on the function body after successfully decoding and accepting the huge initializer expression.
@eqrion Does SpiderMonkey have a limit for constant initializer expressions? What do you think about specifying one?
We do not have a limit for constant initializer expressions. This was previously discussed in https://github.com/WebAssembly/extended-const/issues/15 and never got resolved. I was in favor of having one at that time. I seem to recall this was also discussed in a CG meeting and didn't get consensus around having one but cannot find it (@tlively @rossberg do you remember this?)
One difficult part is that we don't know the size of an init expression before the end instruction is parsed. They're not prefixed with their size. This means either our decoders already need to be robust to really big init expressions and fail after validating if they were too big, or else add new checks while validating to see if it has grown too big. If I remember right, this was the reason we didn't add a limit, but I might be wrong.
Thinking about this now, I think we could efficiently support an implementation limit here (fusing the decoder's bytecode length check with the implementation limit check), and it still seems like a reasonable limit to me.
Yeah, it seems odd not to have a limit on this. I only (vaguely) remember a CG discussion about retroactively hacking in an actual size prefix into the binary format, which there was no agreement on. A limit on the size without needing such a prefix seems like the nicer option, if it can be implemented without significant overhead, which sounds totally plausible.
Either way, the core spec should definitely allow for the possibility of embedders imposing a limit.
I'd be fine with supporting a limit on init expressions equivalent to function bodies (7,654,321 bytes). That should be pretty conservative, we probably could go lower too.