binaryen
binaryen copied to clipboard
Isorecursive types benchmarking plan
This is the list of things that need to be done to finish evaluating the performance of the new type system compared to nominal types.
- [ ] Implement the hybrid isorecursive type system behind the
--hybridflag. - [ ] Implement an option for deterministically producing minimal recursion groups for a set of nominal types, possibly emitting the same isorecursive group multiple times.
- [ ] Implement an option for producing a single large recursion group for a set of nominal types.
- [ ] Benchmark a nontrivial, unoptimized j2cl module with nominal types, a single recursion group, and minimal recursion groups.
At this point I will be ok officially adopting the hybrid isorecursive system if the benchmark results look reasonable. But to prove out the full optimizability of the proposal, we will additionally need to:
- [ ] Refine the internal type system to separately track separate instances of the same isorecursive type, unlocking nominal-like optimizations for isorecursive types.
I understand why we want to benchmark nominal vs a single recursion group - those should end up optimized in an identical way, and ideally take the same amount of time to optimize. (Speaking of which, when you say "benchmark", do you mean compile time, runtime, code size, all of the above, or maybe more things?)
But I'm not sure how we would evaluate benchmarks for minimal recursion groups. That situation would only happen when the user wants to support dynamic linking/separate compilation etc. It is natural for there to be some overhead from that. Say it causes 50% overhead - is that good or bad? Furthermore, even if we find it has more overhead than we'd like (whatever that number is), we would have a bunch of possible ways to improve that, but all of them would require having good real-world code that actually needs dynamic linking, I think? For example, we could emit non-minimal groups that are "just enough" for the ABI between the separately-compiled modules. But without real-world code I don't see how we can measure the tradeoffs there, and we don't have that.
My thinking is that it's fine to officially adopt the hybrid isorecursive system if the performance of a single recursion group is about the same as of nominal typing. That completely supports all existing real-world use cases that we have. For dynamic linking, we have reasonable reasons to think that multiple recursion groups will help there, and we can explore that later once toolchains actually want to emit such code - that seems sufficient to me.
By "benchmark" I mean measuring time to build and canonicalize the types after parsing the type section, with the goal of understanding how the type system will affect module validation time in engines.
In the fullness of time we expect there to be dynamically linked applications that need to use minimal recursion groups. Since my goal here is to determine the suitability of isorecursive types for the spec as opposed to for any particular application, I'm fine simulating that worst case rather than waiting for real-world code that needs it.
The question of how much overhead is too much overhead is a good one, and we don't have a definite cutoff. As an upper bound, when we benchmarked equirecursive types here, we had informal consensus that doubling the time it takes to parse, validate, and run the baseline compiler on a module was too much.
My thinking is that it's fine to officially adopt the hybrid isorecursive system if the performance of a single recursion group is about the same as of nominal typing. That completely supports all existing real-world use cases that we have.
If you're thinking about the runtime performance of the optimized code, I agree. But if you're thinking about the performance of type canonicalization, I disagree. If we set the bar that low for the MVP, then we might as well just ship nominal types. The only reason to complicate the spec with recursion groups and isorecursive canonicalization is if it will actually be useful for something in the future. By checking that the performance of isorecursive canonicalization is acceptable, we can rule out performance as a reason isorecursive types might not be useful.
By "benchmark" I mean measuring time to build and canonicalize the types after parsing the type section, with the goal of understanding how the type system will affect module validation time in engines.
Ah, thanks, but I am confused, then. I asked this in the CG meeting and Rossberg was pretty clear that this should be trivially fast during VM startup given everything is simple and linear, and no one else had any thoughts otherwise. Have I misunderstood?
Where could significant overhead come from, if this is a simple linear pass over the types?
Oh, another point: I think it's fine even if this makes startup 2x slower. The problem before was that the overhead was unavoidable. But now it is easily avoided by using a single group - and that will be the common case, most likely. People using dynamic linking will be opting in to something with additional overhead for a bunch of other reasons, like larger code size. That is, paying some VM cost is not unreasonable in return for the extra functionality of dynamic linking, isn't it?
Although it is linear, isorecursive canonicalization still does significantly more work than nominal "canonicalization," even when using a single group, so it is worth measuring. I agree that I would expect it to be very fast and unproblematic, though.
I don't agree that significantly slowing down baseline compilations should be an acceptable cost of using minimal recursion groups (because that's using the type system "as intended"), but hopefully it's a moot point.
Fair enough, always best to measure just to be safe I guess.
I don't agree that significantly slowing down baseline compilations should be an acceptable cost of using minimal recursion groups (because that's using the type system "as intended")
Ah, maybe this is where we see things differently.
My perspective is that "as intended" means something different for a single wasm file versus one meant to be linked at runtime:
- For a single wasm file we want a single group, because that is the smallest and most efficient binary: no extra bytes to describe multiple rec groups, and also toolchain optimizations may work better.
- For a linkable wasm file we are opting in to additional functionality, so extra overhead is expected and acceptable.
Initial measurements with a single large recursion group look good: https://github.com/WebAssembly/gc/pull/243#issuecomment-1027321008
WasmGC has adopted the isorecursive type system.