lakeroad
lakeroad copied to clipboard
Using "bit dependency graphs" to map expressions to hardware: from Lakeroad black-box to Lakeroad white-box
@bkushigian , @yzh119 , and I had a discussion this past Wednesday about how we can utilize information contained in the expressions we're trying to synthesize. Currently, Lakeroad treats the target expressions we're trying to synthesize (e.g. (bvadd a b)
) as black boxes which take bits in (a
and b
, in this case) and give bits out. When we're using Rosette, we give symbolic bitvectors in, and get a bitvector expression out. We then use that expression in Rosette's synthesis routine to find a hardware implementation of the expression. What's happening under the hood of Rosette can probably be described as using something about the structure of the expression to find the resulting hardware expression...but from the viewpoint of Lakeroad, we just throw the expression into the synthesis machine and hit "go". The question is, can we more explicitly use knowledge about the expression to find a hardware implementation?
Riffing on that, the knowledge I specifically was thinking of is what we began calling a "bit dependency graph" or "bitwise dependency graph". For a given expression over bitvectors, you can draw a graph showing which bits of the output depend on which bits of the input. For example, for a two bit and
of a
and b
which produces c
(whose bits are, from MSB to LSB, a1
/b1
/c1
and a0
/b0
/c0
), c1
depends on a1
and b1
, and c0
depends on a0
and b0
:
c0: a0, b0
c1: a1, b1
A more interesting example would be a + b
. In this case, c0
still only depends on a0
and b0
. However, c1
depends not only on a1
and b1
(which we need to sum), but also a0
and b0
(which will determine the carry-in, which we'll also need to include in the sum):
c0: a0, b0
c1: a1, b1, a0, b0
We can continue to refine this even further, however. c1
doesn't depend directly on a0
and b0
, but instead, on some function f(a0, b0)
, where we assume the output of the function is also a single bit:
c0: a0, b0
c1: a1, b1, f(a0, b0)
Which we could also rewrite using an intermediate value:
c0: a0, b0
carry: a0, b0
c1: a1, b1, carry
Any bit dependency graph can be used to build a hardware design on an FPGA. For example, in the case of a and b
, we know that c0
depends on a0
and b0
; in other words, it is some function of a0
and b0
. Lookup tables implement any function over their inputs, so we can implement our circuit as
c0 = LUT(lutmem0, a0, b0)
c1 = LUT(lutmem1, a1, b1)
Where lutmem0
and lutmem1
are the lookup tables' memories. (Note: we may even be able to figure out that lutmem0 == lutmem1
, based on the fact that they come from the same parent function, the and
of a
and b
. But that's a later optimization!)
Some bit dependency graphs will result in different (and possibly less optimal) designs than others. Take for example the add example. The dependency graph
c0: a0, b0
c1: a1, b1, a0, b0
becomes
c0 = LUT(lutmem0, a0, b0)
c1 = LUT(lutmem1, a1, b1, a0, b0)
While the dependency graph
c0: a0, b0
carry: a0, b0
c1: a1, b1, carry
becomes
c0 = LUT(lutmem0, a0, b0)
carry = LUT(lutmem1, a0, b0)
c1 = LUT(lutmem2, a1, b1, carry)
The designs are clearly different, though determining which is "better" depends on your objectives (timing, area) and the FPGA platform you're targeting.
More importantly, though, we have the opportunity to detect places where certain hardware primitives can be used. For example, a generic carry chain primitive computes:
c_0 = xor(cin, s_0)
co_0 = if(s_0, cin, di_0)
c_1 = xor(co_0, s_1)
co_1 = if(s_1, co_0, di_1)
...
We can match this pattern into dependency graphs, though this gets tricky if our dependency graphs are "general", that is, we only specify that c1
depends on a1
and b1
, rather than explicitly saying that c1 = some_op(a1, b1)
.
TODO what do we use it for? Egraph-based synthesis?
@bkushigian has developed a method for discovering the bit dependency graph of a black-box function via delta-debugging, and has a writeup on it.