wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Cranelift: use egraphs by default

Open fitzgen opened this issue 1 year ago • 4 comments

It should be enabled by default. We should also define what the requirements we need to meet to enable it by default are.

cc @cfallin @jameysharp

fitzgen avatar Nov 02 '22 16:11 fitzgen

As far as I know there are two specific reasons we haven't done this yet:

  1. Compile speed regresses a little bit with egraphs turned on.
  2. We were encountering some fuzzing failures.

We have some ideas that might get the egraphs optimizer to parity with the current implementation and we want to play with those.

jameysharp avatar Nov 02 '22 17:11 jameysharp

Indeed; to add a little more, the overall approach I want to take is to integrate the egraph data structure more completely into the DataFlowGraph. Basically, the PrimaryMap of ValueDefs can take the place of the "eclass nodes" in cranelift-egraph currently: alongside the other kinds of defs (instruction output, blockparam, alias), we can have a "union" kind. Then the multi-definition aspect of the aegraph can be represented directly in the CLIF.

Then the only other difference is that we can take "pure" insts and not place them in blocks before elaboration. So the egraph-mode becomes:

  • Allow multiple definitions of a value in the CLIF, by introducing a "union" kind of value (write this in the CLIF text format as e.g. v3 <- v1, v2 analogously to aliases currently);
  • Apply rewrite rules as soon as values are introduced, as we do in egraph-mode today (optimizations-during-build);
  • Do not place "pure" instructions in the Layout until elaboration. Elaboration consists of (i) choosing a canonical value for each value (a value->value map), and (ii) putting Insts in the Layout, possibly duplicating where needed;
  • Modifying the hashing in GVN a bit so that we hash insts based on canonical values (via a union-find forest) rather than latest-value-in-eclass values.

There are a bunch of nice outcomes from that (and once I saw it, I realized we basically must go this way, and a separate egraph doesn't make sense anymore):

  • Any CLIF is also almost an aegraph, trivially / by construction. (The lack of placement for pure nodes is the only difference).
  • There is no copying/translation of information between egraph nodes and CLIF instructions.
  • We get all the debugging benefits of the text format and other utilities surrounding CLIF.

I want to build all of this first, and suspect that when I do, the perf regressions we're seeing should go away. The reason for holding back before then is both because of the regression but also because if the design is going to change, we should do that before we rely on it as the default path.

I'm currently 100%-coding-time on another project but will hopefully be able to come back to this sometime soon-ish; @jameysharp and I have talked about collaborating on its implementation as well.

cfallin avatar Nov 02 '22 17:11 cfallin

Would it be possible to replace Layout with a different type while the function is in aegraph form, but keep DataFlowGraph in both cases? This would extend to more alternative representations like VSDG/sea of nodes or something with structured control flow (for eg wasm or spirv generation) with minimal (or no) changes to the core ir while at the same time minimizing code duplication.

bjorn3 avatar Nov 02 '22 17:11 bjorn3

Yep, we could go further in that direction as well; that's an interesting direction to explore.

cfallin avatar Nov 02 '22 17:11 cfallin