lakeroad Using "bit dependency graphs" to map expressions to hardware: from Lakeroad black-box to Lakeroad white-box

Using "bit dependency graphs" to map expressions to hardware: from Lakeroad black-box to Lakeroad white-box

Open gussmith23 opened this issue 2 years ago • 0 comments

@bkushigian , @yzh119 , and I had a discussion this past Wednesday about how we can utilize information contained in the expressions we're trying to synthesize. Currently, Lakeroad treats the target expressions we're trying to synthesize (e.g. (bvadd a b)) as black boxes which take bits in (a and b, in this case) and give bits out. When we're using Rosette, we give symbolic bitvectors in, and get a bitvector expression out. We then use that expression in Rosette's synthesis routine to find a hardware implementation of the expression. What's happening under the hood of Rosette can probably be described as using something about the structure of the expression to find the resulting hardware expression...but from the viewpoint of Lakeroad, we just throw the expression into the synthesis machine and hit "go". The question is, can we more explicitly use knowledge about the expression to find a hardware implementation?

Riffing on that, the knowledge I specifically was thinking of is what we began calling a "bit dependency graph" or "bitwise dependency graph". For a given expression over bitvectors, you can draw a graph showing which bits of the output depend on which bits of the input. For example, for a two bit and of a and b which produces c (whose bits are, from MSB to LSB, a1/b1/c1 and a0/b0/c0), c1 depends on a1 and b1, and c0 depends on a0 and b0:

c0: a0, b0
c1: a1, b1

A more interesting example would be a + b. In this case, c0 still only depends on a0 and b0. However, c1 depends not only on a1 and b1 (which we need to sum), but also a0 and b0 (which will determine the carry-in, which we'll also need to include in the sum):

c0: a0, b0
c1: a1, b1, a0, b0

We can continue to refine this even further, however. c1 doesn't depend directly on a0 and b0, but instead, on some function f(a0, b0), where we assume the output of the function is also a single bit:

c0: a0, b0
c1: a1, b1, f(a0, b0)

Which we could also rewrite using an intermediate value:

c0: a0, b0
carry: a0, b0
c1: a1, b1, carry

Any bit dependency graph can be used to build a hardware design on an FPGA. For example, in the case of a and b, we know that c0 depends on a0 and b0; in other words, it is some function of a0 and b0. Lookup tables implement any function over their inputs, so we can implement our circuit as

c0 = LUT(lutmem0, a0, b0)
c1 = LUT(lutmem1, a1, b1)

Where lutmem0 and lutmem1 are the lookup tables' memories. (Note: we may even be able to figure out that lutmem0 == lutmem1, based on the fact that they come from the same parent function, the and of a and b. But that's a later optimization!)

Some bit dependency graphs will result in different (and possibly less optimal) designs than others. Take for example the add example. The dependency graph

c0: a0, b0
c1: a1, b1, a0, b0

becomes

c0 = LUT(lutmem0, a0, b0)
c1 = LUT(lutmem1, a1, b1, a0, b0)

While the dependency graph

c0: a0, b0
carry: a0, b0
c1: a1, b1, carry

becomes

c0 = LUT(lutmem0, a0, b0)
carry = LUT(lutmem1, a0, b0)
c1 = LUT(lutmem2, a1, b1, carry)

The designs are clearly different, though determining which is "better" depends on your objectives (timing, area) and the FPGA platform you're targeting.

More importantly, though, we have the opportunity to detect places where certain hardware primitives can be used. For example, a generic carry chain primitive computes:

c_0 = xor(cin, s_0)
co_0 = if(s_0, cin, di_0)

c_1 = xor(co_0, s_1)
co_1 = if(s_1, co_0, di_1)

...

We can match this pattern into dependency graphs, though this gets tricky if our dependency graphs are "general", that is, we only specify that c1 depends on a1 and b1, rather than explicitly saying that c1 = some_op(a1, b1).

TODO what do we use it for? Egraph-based synthesis?

@bkushigian has developed a method for discovering the bit dependency graph of a black-box function via delta-debugging, and has a writeup on it.

Aug 20 '22 00:08 gussmith23

lakeroad lakeroad copied to clipboard

Using "bit dependency graphs" to map expressions to hardware: from Lakeroad black-box to Lakeroad white-box

lakeroad
lakeroad copied to clipboard