calyx icon indicating copy to clipboard operation
calyx copied to clipboard

Using Egg to Deduce "Optimal" Programs

Open paili0628 opened this issue 1 year ago • 1 comments

This is a very coarse-grained proposal for how to apply Egg to optimize our compiler. The main purpose is to generate discussion.

We have a critical mass of passes that: (1) collect information about the control program (is it promotable to static time? does it need a new FSM, etc.) and (2) optimizes them. Currently, we apply these passes in a particular order. For example, by the time we apply collapse control, we’ve already done the static optimization. However, this might be suboptimal (this is called the “phase ordering problem” in compilers). Instead, we can use egg (https://egraphs-good.github.io/) which is a library for representing program optimizations through “rewrite rules” like x * 1 => x. Egg’s super power is its ability to represent all possible versions of the optimized programs compactly and apply heuristics to extract the program you want.

Here is a debrief of the pipeline for using Egg:

  1. Input an expression, which Egg will itself compile to an AST using its input Language
  2. Define rewrite rules
  3. Use Runner to generate all expressions that can be generated using rewrite rules and store them in an e-graph
  4. Use an Extractor to find the optimal expression (program) (we have to define "optimal")

In order to get Egg working, we need to define two things:

  1. Language: how to parse an input expression
  2. Rewrite rules: the rules to apply to rewrite an expression (in our case, a program)

An intuitive but coarse-grained way of applying Egg to the Calyx compiler would be:

  1. Using the IR for a program as the expression for rewrite, Language in this case could contain just one item: an IR
  2. Applying the compiler passes as rewrite rules

The challenges are:

  1. Currently, the biggest issue is that in order for Egg to output an optimal Calyx program, we will have to tell it what "optimal" means. In other words, we have to provide Egg with a loss function defining aspects we want to optimize: the two biggest ones in the compiler are: resource usage and static timing.
  2. We cannot directly use all optimization passes as rewrite rules, since some of the compilation would only make sense when running in sequential order with others, (for example, compile-ref must run before component-inline). So we have to sort out the dependency between passes and use alignments of passes as rewrite rules instead of single passes.
  3. Saturation detection: Theoretically, Egg applies rewrite rules to expressions until the e-graph reaches saturation, i.e., until it cannot apply any of the rewrite rules to any of the programs generated. However, the optimization passes can be run on a Calyx program infinitely(although at some point it will stop changing anything, in which case, we should consider a compiler pass as a rule being inapplicable to a program). In order for this to work, we have to implement functions to detect when a program is "optimizable" by a compiler pass.

This, of course, would be a naive implementation. In order to better take advantage of Egg's compactness of expression, we might eventually want to somehow expose the data structures in the IR to Egg instead of passing in the IR as a whole. We would also want to apply each visitor function inside compiler passes as a rewrite rule instead of whole compiler passes. Still, it might be good to get the current proposal working as a baseline version for future comparison.

paili0628 avatar Jun 13 '23 04:06 paili0628

Awesome, thanks for writing this up. A couple of thoughts and references to look at:

  1. Optimality is definitely a hard thing to define, especially when it comes to hardware designs since various choices may introduce different outcomes in a multi-objective optimization space. I think the key idea here is starting with something simple like "maximize the number of static nodes" or "minimize the amount of nesting". In order to encode more complex optimality constraints, we can use integer linear constraints. For example, read the IMPRESS paper from Zhiru's group.
  2. That's right, some passes are performing "lowering", which gets rid of language constructs while others are optimizing the program. We can probably only use the latter to begin with. It is possible to do lowering and optimization at the same time, as we did in Diospyros but that is definitely more complex.
  3. Yup, again something that you have to think about with equality saturation. The nice thing is that you don't actually have to wait for saturation (and most of the time, you can't). You can instead define some time or resource budget and only run N rewrite application steps before extracting the new program. A key point to understand is that as you increase the budget, maybe you get better and better programs but even with a small budget, you can do pretty well. Again Diospyros did this.

Another factor to consider is that the egg authors implemented a new version of egg called egglog (and are presenting a paper on it at PLDI). egglog is much less documented but it is faster and, in my experience, easier to use. We can spend some time thinking about using egglog instead (and chat with the egglog authors at PLDI).

Egglog uses a LISP-like language to define the rewrite program and allows much finer-grained control on performing the equality saturation. Check out their demon.

rachitnigam avatar Jun 13 '23 18:06 rachitnigam