binaryen [NFC] Walk module to update names in binary parser

The binary parser generally does not know the final names of module elements when it parses them, or even when it parses instructions that refer to them, since the name section comes at the end of a binary. The parser previously kept a list of pointers to locations where each module element's name would have to be used, then it patched those locations after parsing the names section to discover the final names.

When the binary parser starts using IRBuilder, the parsed expressions will be constructed and managed by IRBuilder rather than by the parser itself. This means that the parser will no longer be able to collect pointers to places where module element names are used; it won't have access to the instructions at all.

Since the strategy of collecting locations to patch will no longer work, switch to a strategy of traversing the module to find and update names instead. This is generally less efficient because the locations have to be found before they can be updated, but on the other hand it only happens when preserving debug info and it is parallelizable anyway.

Sep 19 '24 04:09 tlively

This extra traversal of all the code worries me. Can IRBuilder not collect those pointers as it creates things?

Sep 19 '24 14:09 kripken

It could in principle, but that would be an expansion of its responsibilities and would make the abstraction fuzzier. I think this separation of concerns is nicer, but I can measure the performance impact.

Sep 19 '24 15:09 tlively

Ok, I looked into the performance overhead of this change and I think it is acceptable. In the very worst case where --debuginfo is used, but there are not actually any new names to apply, the slowdown to binary parsing is 6%. When there are names to apply, the slowdown is about 0.2%, measured with both 128 threads and 2 threads.

Sep 19 '24 22:09 tlively

Can you explain the difference between the two cases? I don't follow that. Isn't this extra pass over the module always done when there are names, so the overhead is fixed?

Sep 19 '24 23:09 kripken

Right, we currently do the walk whenever --debuginfo is used, and if there are no names to apply, it simply won't make any changes. I had been thinking that the current code path can avoid doing any work in this case, but I don't think that's right, since there are locations where we simply don't set the names before backpatching.

Sep 19 '24 23:09 tlively

Yeah, maybe we can avoid it somehow, but I think atm we do figure out names late for some things.

So IIUC, is 6% the relevant overhead, then?

Overall my concern is that adding an extra pass to the entire module can be pretty costly. It's not like adding another ReFinalize at the end of a function, since that function is in the cache already - this is another loading and processing of the entire module from the start, which can be slow on large modules.

Sep 19 '24 23:09 kripken

I guess 6% is the relevant worst-case overhead we've seen, but I think it's also notable that on real modules we've also seen 0.2%. This shouldn't be any different in terms of caching because all the function bodies have just been parsed and should also be in caches. If we parallelize parsing at some point, the bodies will even be in the caches on the correct threads.

Sep 20 '24 00:09 tlively

6% seems high to me. Maybe it's because I've been working on a series of such optimizations recently (moderate speedups, some of which involving avoiding an extra pass on the IR), but I think it's worth trying to avoid this regression. Some ideas:

We aren't in a streaming situation like browsers: we have the entire file up front. We could skip to the name section and read it first, then go back and build the IR using those names perhaps? The skipping should be fast as the binary format makes that simple, and the number of sections is small.
Or, could we subclass IRBuilder to add more logic, to add the code that takes the address of names? We just need to be able to hook in there, and could use wasm-delegations-fields to find the fields with names, in just a few lines of code.

This shouldn't be any different in terms of caching because all the function bodies have just been parsed and should also be in caches.

Difference compared to what? Sorry, I'm not following this. My concern is that a big module simply cannot fit in cache - large wasm modules expand into hundreds of MB of IR - so doing another pass over that IR means cache misses for it all. I am guessing that is that 6%?

Sep 20 '24 14:09 kripken

I'll look into skipping ahead 👍

Sep 20 '24 17:09 tlively

Closing this since #7074 has landed.

Nov 23 '24 09:11 tlively