binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

[wasm-opt] Split functions that break the limit

Open surma opened this issue 6 years ago • 15 comments

All browsers (and Node) seem to implement the same size limit for functions (somewhere around the 7MB mark).

How about a pass for wasm-opt that breaks long functions up into smaller pieces? AFAICT, this should be doable by partitioning the big function and potentially passing the state of locals from one function to the next.

WDYT?

surma avatar Aug 27 '19 18:08 surma

This might be nice to add, yeah. It's not trivial, though - breaking up a function is quite hard in general (handling loops, etc., avoiding breaking on a boundary that is called across a lot, that has few locals that need copying, etc.). Older emscripten had an "outlining" pass for this, which did something similar, and it was very hard to get right.

Perhaps it's easier to convince browsers to raise the 7MB limit ;) did you have a specific size that you hit this with?

kripken avatar Aug 27 '19 20:08 kripken

Fun fact. All browsers have same limit equivalent to 7654321 bytes) v8 WebKit Gecko

MaxGraey avatar Aug 27 '19 21:08 MaxGraey

Thanks for the amazing project and sorry for bumping such an old issue.

In Chicory we do hit the JVM limits for a single method size.

Having a way to arbitrary split huge Wasm functions(e.g. the CPython interpreter loop) is extremely appealing to us. Now the questions:

  • is there any progress in this repo on the subject?
  • are there reference implementations? (e.g. the mentioned emscripten impl) and/or documentation/material around it?

andreaTP avatar Feb 19 '25 14:02 andreaTP

There isn't specific progress, though there is work on an outlining pass which might end up reducing code size. It's not intended to handle this situation, though (in particular it might not work on code with branches, locals, etc.).

Is there no JVM flag to avoid the issue for you? If not, is this due to machine-generated code perhaps, that can be adjusted (that is usually the case in emscripten bug reports)?

If not, options could be

  • If this is due to very long blocks, a simple splitting pass could be written. It would just move parts of functions out, and copy/restore locals along those calls. That's all the old emscripten code did, but it's enough.
  • If this is due to nesting, perhaps the outlining pass could be generalized for this.

kripken avatar Feb 19 '25 17:02 kripken

Thanks a lot @kripken for getting back!

Is there no JVM flag to avoid the issue for you?

Unfortunately not, this is a hard limit very deeply encoded.

If not, is this due to machine-generated code perhaps, that can be adjusted

Do you mean that it's the compiler itself that should be able to split the function in first place? Considering the example of CPython are you suggesting opening an issue in the wasi-sdk repo?

  • If this is due to very long blocks ...
  • If this is due to nesting ...

Given the data points that I have, both applies, the issue is often triggered by Go code compiled to WASM while, usually, with C or Rust code wasm-opt is already reducing the size of the methods enough to avoid the limit.

andreaTP avatar Feb 19 '25 18:02 andreaTP

Do you mean that it's the compiler itself that should be able to split the function in first place?

Often that is the case, yes, if it is autogenerated source code. Adjusting the autogenerator to emit smaller functions is often possible, at least in the bug reports emscripten gets about this.

But it sounds like you see this on CPython compiled by clang? That would mean an issue in LLVM itself then, likely with no such simple fix.

Does wasm-opt -O3 or -Oz not help here? If not, please attach the wasm file, as the shape will determine what I can recommend.

kripken avatar Feb 19 '25 19:02 kripken

Thanks again for the feedback!

Does wasm-opt -O3 not help here?

It does in most cases(e.g. with CPython), I'll update this issue as soon as I have a new example (probably ruby/prism is a good starting point).

at least in the bug reports emscripten gets about this

Do you have a link to spare so that I can see how the process goes in emscripten? 🙏

andreaTP avatar Feb 19 '25 19:02 andreaTP

Searching by the error is probably best if you want to see old issues. The error on such huge functions is typically due to the size in bytes, e.g.

https://github.com/emscripten-core/emscripten/issues/16690

or the number of locals,

https://github.com/emscripten-core/emscripten/issues/18159 https://github.com/emscripten-core/emscripten/issues/19346 https://github.com/emscripten-core/emscripten/issues/21847

kripken avatar Feb 19 '25 20:02 kripken

Ok, found 10 minutes to provide one example reproducer with Prism:

  • given this build artifact: https://github.com/ruby/prism/actions/runs/13090799608/artifacts/2521567203
  • the function pm_dump_json is producing a huge single wasm function
  • even after wasm-opt -O3 the function is too big

I'll try to provide something based on Go soon.

andreaTP avatar Feb 20 '25 16:02 andreaTP

Found an easily inspectable Go package!

  • from this archive: https://registry.npmjs.org/esbuild-wasm/-/esbuild-wasm-0.25.0.tgz
  • extract package/esbuild.wasm
  • function 3093 is the first hitting the limit

In this case wasm-opt takes a super long time to execute and doesn't reduce much the size:

11341875 Feb 20 18:10 esbuild-opt.wasm
12118710 Oct 26  1985 esbuild.wasm

andreaTP avatar Feb 20 '25 18:02 andreaTP

The prism function pm_dump_json is very large indeed, here is --func-metrics:

func: pm_dump_json
 [binary-bytes] : 55984   
 [total]        : 24795   
 [vars]         : 5       
 Binary         : 3314    
 Block          : 804     
 Break          : 929     
 Call           : 2532    
 Const          : 4473    
 GlobalGet      : 696     
 GlobalSet      : 2       
 Load           : 1770    
 LocalGet       : 7575    
 LocalSet       : 1581    
 Loop           : 36      
 Store          : 765     
 Switch         : 1       
 Unary          : 317     

Looking at it, those 804 blocks are deeply nested, with lots of code in the tails, a typical switch pattern,

(block
  (block
    (block
      ..
      code1
      br
    )
    code2
    br
  )
  code3
  br
)

Splitting code1,2,3 out requires pulling code out of nested blocks in the middle, and handling locals and branches in those places.

@tlively @ashleynh can the current Outlining pass do that? Note that the goal here is just to split up, not to find common code patterns for code size reasons.

kripken avatar Feb 20 '25 19:02 kripken

We can't outline anything that uses locals or has control flow out of the outlined region at the moment. There also isn't a great interface for telling the outliner to outline a particular sequence of expressions yet.

I think the simplest and most general thing to do here would be to turn each basic block in the huge function into a separate function and turn branches into tail calls. It would be possible to get fancier and try to optimize the partitioning of the CFG, but that's what I would start with.

tlively avatar Feb 20 '25 19:02 tlively

Ping, is there any update on the subject? 🙂 🙏

I'm especially interested if you have specific suggestions(configuration options?) to split the big functions of SQLite. Apparently wasi-sdk is already doing a decent job, as wasm-opt is struggling to identify optimization patterns, and results are silimar.

I have been thinking about the subject recently, but need to confirm ideas with you:

  • is the usage of locals the only(or the main) blocker to have generic method splitting?
  • in case the prev is true, would it be possible to convert locals into globals? Any downside I haven't considered?

andreaTP avatar Jul 09 '25 13:07 andreaTP

I'm not aware of active work on this, no. PRs would be welcome if you or anyone else wants to take a look at this, though.

For the approach I suggested where each basic block is turned into a separate function, the first thing I would try would be to make the locals read in each block into function parameters and make the locals written in each block into function results. Alternatively, if you can use WasmGC, you could put the locals into a struct and pass it around. Globals would also work, but then you're stuck with a permanent memory overhead.

tlively avatar Jul 09 '25 18:07 tlively

Thanks for getting back @tlively and thanks for the guidance. I'll keep this on my watch list, if no one comes first to it I might be able to dedicate some time in ~Q4

andreaTP avatar Jul 10 '25 09:07 andreaTP