calyx icon indicating copy to clipboard operation
calyx copied to clipboard

A Lower Lowered Form

Open sgpthomas opened this issue 3 years ago • 10 comments

This is from a discussion with @vegaluisjose At the moment, our lowering pass completely removes control, but leaves a lot of guards in the wires section. We can go further and remove the guards by instantiating components for the guards and using a mux cell. This has several benefits,

  1. The approximate resource usage of a component becomes more evident from just looking at a Calyx program because the resource usage is basically just the cells in the cells section.
  2. This makes the lowest form of Calyx basically just a net list and makes it easier to support more backends (in particular reticle).

I think this should be implemented in another lowering pass that runs after the control lowering. This pass would just walk over the AST guard tree for each assignment and replace the binary operators with a corresponding cell. For generating muxes, you can do the naive thing and have a "tree" of muxes (I think there's some name for this). Consider this program:

a = x ? b;
a = y ? c;

We currently generate the following verilog:

assign a = x ? b : (y ? c : 0);

If we wanted to instantiate muxing, it would look something like:

mux1.true = b;
mux1.false = mux2.out;
mux1.cond = x;
mux2.true = y;
mux2.false = 0;
mux2.cond = y;

However, if you can prove that x and y are disjoint, you can generate the more efficient code:

mux1.true = b;
mux1.false = c;
mux1.cond = x;

sgpthomas avatar May 28 '21 18:05 sgpthomas

fud exec --from futil ... --to futil-lowest

EclecticGriffin avatar May 28 '21 18:05 EclecticGriffin

fud exec --from futil ... --to futil-lowerest

sgpthomas avatar May 28 '21 19:05 sgpthomas

Hm, this kind of optimization can (and probably should) be done at a lower level "dialect" (borrowing from the MLIR terminology). For example, the hw dialect in CIRCT can probably do this kind of optimization. The meta question here is what kind of optimizations are out of scope for Calyx? From a research perspective, it makes sense to me to aggressively focus on things that other dialects cannot do. Of course, we need some set of base optimizations to make a reasonable case that we have a real compiler.

Some of these problems are potentially solved by hooking into the CIRCT infrastructure.

rachitnigam avatar May 29 '21 18:05 rachitnigam

I think the argument is that operations (guards) in the wire region are not free and they actually consume resources. It will be nice if the wire section is based only on "wiring". The std_mux PR was motivated for doing this lowering -- muxes aren't free and can consume the same resources as any other binary operation.

vegaluisjose avatar May 29 '21 20:05 vegaluisjose

Huh, this is a thought-provoking idea! It would be fun to chat sometime in a little more detail with @vegaluisjose included. I think the questions about the scope for Calyx and the cost model are worth considering.

Very broadly, I think the thing that makes this idea so interesting is that it's basically proposing to perform logic synthesis. That is, the lowering would translate from logical expressions to concrete logic circuits that implement those expressions—which is the job of a synthesizer! The translation goes from behavioral guard expressions to something that looks very much like a netlist. Looked at differently, it's "just" doing an ANF flattening of expressions into atomic operations, but I'd argue that this is just the "easiest" implementation of the synthesis contract.

I always thought it was a somewhat odd-looking but extremely practical choice for Calyx to incorporate some amount of behavioral description, even in its lowest, most structural form. We're all about lowered Calyx essentially amounting to "actual hardware," but really, the logic expressions in guards are a way to hide quite a bit of not-exactly-hardware complexity. That seems strange, but I think it's super important so that Calyx didn't have have "solve the logic synthesis problem" in scope—it could isolate the chunks of logic that still need synthesizing and essentially preserve them directly into the Verilog to rely on back-end toolchains to do a good job.

So basically, I think it's true that making Calyx do excellent circuit synthesis is probably still worth avoiding. But on the other hand, it's an intriguing idea that Calyx's abstractions, like groups and control, could coexist with pre-synthesized actually-structural chunks of hardware. And as a practical matter, an ANF flattening is probably necessary at some stage to translate to other, lower-level representations, including both Reticle and a hypothetical CIRCT dialect for RTL (which doesn't seem to be fully formed yet). So maybe it's worth thinking a little bit about how to build a simple prototype that does not try very hard at all to do good synthesis?

sampsyo avatar Jun 01 '21 12:06 sampsyo

I agree that Calyx isn't the place to do full on synthesis. I have some questions about synthesis though. How much of logic synthesis is mapping "behavioral" code to high level logic gates vs going from high level logic to netlists with LUTs and DSPs? And how much information is needed to do this effectively? Is the guard information important for this process? @vegaluisjose

sgpthomas avatar Jun 01 '21 12:06 sgpthomas

FWIW, my understanding is that using LUTs and stuff is a subsequent step called "technology mapping" (which happens before placement and routing), but Luis is the expert. (For tangential context, for example, here is a paper that discusses the interaction between technology mapping and placement.)

sampsyo avatar Jun 01 '21 13:06 sampsyo

Yeah that is correct.. synthesis can be viewed as converting program into boolean expressions, then performing partial evaluation (known as logic optimizations), and finally mapping these expressions into LUTs.

FWIW, I think synthesis might be a strong word here since there is a one-to-one mapping between Calyx guards and Calyx cells and the idea is to convert guards into cells. In other words, I am not proposing decomposing cells or mux guards into boolean expressions or gates, but rather move those guards into cells and let low-level tools do the transformations of just cells to FPGA primitives. Therefore, creating a backend for Calyx wouldn't imply lowering two different dialects -- cells and wires.

Happy to chat anytime with you guys!

vegaluisjose avatar Jun 01 '21 16:06 vegaluisjose

One other choice that this process has to make is whether or not to instantiate priority/unique logic for the muxes we do instantiate. Right now, we hope and pray to the synthesis gods that we don't generate priority logic but with this pass, we can explicitly control for that.

rachitnigam avatar Jun 01 '21 16:06 rachitnigam

Cool! Maybe sometime this or next week would work for folks? (Perhaps we should take this to Slack.)

sampsyo avatar Jun 01 '21 18:06 sampsyo

Subsumed by #1177

rachitnigam avatar Oct 01 '22 00:10 rachitnigam