synth_opt_adders
synth_opt_adders copied to clipboard
Is there a way to generate higher radix adders?
I've tried using radix=4 on forest. However I get the following error.
File ~/gits/synth_opt_adders/src/pptrees/ExpressionTree.py:108, in ExpressionTree.__init__(self, width, in_ports, out_ports, name, start_point, alias, no_shape, radix, idem, leaf_labels, node_defs)
106 cocycle_out_shape = [self.radix * x for x in cocycle_out_shape]
107 if cocycle_out_shape != self.cocycle_shape:
--> 108 raise ValueError(
109 (
110 "The main recurrence node of the tree"
111 " must have the same input and output shape"
112 )
113 )
114 del cocycle_out_shape
116 # Check that node shapes are compatible with port shapes
117 ### NOTE: INPUT SHAPE IS CURRENTLY ASSUMED TO BE [1,1,1,..]
118 ### TO-DO: Allow for arbitrary input shape
119 ### Need to check whole file for this implicit assumption
ValueError: The main recurrence node of the tree must have the same input and output shape
Excellent question!
In short, I had to stop supporting higher-radix structures because it was too much effort and it went beyond the papers I was relying on.
Here's what's needed to support higher-radix adders:
- Create new templates for higher-radix HDL building blocks (like for example, turning this binary black cell into a radix-4 black cell).
- Go into
ExpressionTree.py
and fix all the places where radix-2 is hard-coded. Example. This might be most of the file; it definitely would require the useful algorithms like rotate / rank to be adjusted.
Item 1 is what's causing the error you see: there are no HDL templates for radix-4 adders.
In the end it would be a lot of work. It's something I actively want to do but do not have the time for.
If, however, the HDL generated by this tool is plugged into a synthesis tool with a logic optimizer, radix-4 adders can be emulated. A radix-2 adder could just be flattened/partitioned in a way that makes the logic optimizer treat it as a radix-4 adder. That kind of functionality is implemented and under active use. If you're interested, we can chat about that.
My experience, however, shows that logic optimizers don't think radix-4 adders are useful and when given the chance will just optimize them down to a radix-2 adder with occasional radix-3 sections. Now, whether that's the reality or just a fault with the tools/cells/PDKs/technology, I don't know.
Hi, thank you for the detailed response and for the awesome project. Actually, I'm not targeting any hardware but trying to replicate the results of the this paper. See, in such approaches you go against the typical hardware idea of minimizing xors and using 2-input gates. There, you need maximum xors and more 4-input gates since they reduce logic depth and cost the same for evaluation.
I'd be really interested in seeing how one can achieve that with your code. Note also, they are replacing the or gate with an xor, because it seems to be equivalent.
I've tried generating a Brent-Kung network using this repo but passing that through ABC generates the following critical path which is twice as long as the one in the paper (since we only AND gates in the critical path, XORs come for free).
WireLoad = "none" Gates = 106 ( 12.3 %) Cap = 33.8 ff ( 0.0 %) Area = 3900.00 ( 99.1 %) Delay = 1926.27 ps ( 13.2 %)
Path 0 -- 19 : 0 2 pi A = 0.00 Df = 0.0 -0.0 ps S = 0.0 ps Cin = 0.0 ff Cout = 46.5 ff Cmax = 0.0 ff G = 0
Path 1 -- 70 : 2 2 XOR2X1 A = 0.00 Df = 155.4 -28.0 ps S = 143.5 ps Cin = 31.9 ff Cout = 42.2 ff Cmax = 484.4 ff G = 131
Path 2 -- 71 : 3 1 AND3X1 A = 100.00 Df = 268.2 -11.6 ps S = 85.6 ps Cin = 12.6 ff Cout = 29.7 ff Cmax = 505.5 ff G = 236
Path 3 -- 74 : 2 3 XNOR2X1 A = 0.00 Df = 438.6 -24.5 ps S = 171.4 ps Cin = 31.9 ff Cout = 55.1 ff Cmax = 484.6 ff G = 172
Path 4 -- 75 : 3 1 AND3X1 A = 100.00 Df = 552.8 -5.1 ps S = 84.1 ps Cin = 12.6 ff Cout = 29.7 ff Cmax = 505.5 ff G = 236
Path 5 -- 79 : 2 4 XNOR2X1 A = 0.00 Df = 743.1 -30.0 ps S = 200.8 ps Cin = 31.9 ff Cout = 67.6 ff Cmax = 484.6 ff G = 211
Path 6 -- 80 : 4 1 AND4X1 A = 100.00 Df = 858.4 -7.1 ps S = 84.5 ps Cin = 12.5 ff Cout = 29.7 ff Cmax = 505.5 ff G = 236
Path 7 -- 91 : 2 4 XNOR2X1 A = 0.00 Df =1048.8 -29.9 ps S = 200.8 ps Cin = 31.9 ff Cout = 67.6 ff Cmax = 484.6 ff G = 211
Path 8 -- 92 : 4 1 AND4X1 A = 100.00 Df =1164.2 -7.0 ps S = 84.5 ps Cin = 12.5 ff Cout = 29.7 ff Cmax = 505.5 ff G = 236
Path 9 -- 103 : 2 3 XNOR2X1 A = 0.00 Df =1334.5 -24.6 ps S = 171.3 ps Cin = 31.9 ff Cout = 55.1 ff Cmax = 484.6 ff G = 172
Path 10 -- 104 : 3 1 AND3X1 A = 100.00 Df =1448.6 -5.2 ps S = 84.1 ps Cin = 12.6 ff Cout = 29.7 ff Cmax = 505.5 ff G = 236
Path 11 -- 108 : 2 2 XNOR2X1 A = 0.00 Df =1598.9 -19.3 ps S = 142.4 ps Cin = 31.9 ff Cout = 42.6 ff Cmax = 484.6 ff G = 133
Path 12 -- 111 : 2 1 AND2X1 A = 100.00 Df =1711.7 -3.4 ps S = 84.5 ps Cin = 12.6 ff Cout = 29.7 ff Cmax = 505.5 ff G = 234
Path 13 -- 114 : 2 1 XNOR2X1 A = 0.00 Df =1841.4 -14.1 ps S = 112.5 ps Cin = 31.9 ff Cout = 29.7 ff Cmax = 484.6 ff G = 92
Path 14 -- 116 : 2 1 XNOR2X1 A = 0.00 Df =1926.3 -2.4 ps S = 48.0 ps Cin = 31.9 ff Cout = 0.0 ff Cmax = 484.6 ff G = 0
Using your generator the reported critical is bellow for tree_type="brent-kung"
WireLoad = "none" Gates = 131 ( 34.4 %) Cap = 25.1 ff ( 0.0 %) Area = 5500.00 (100.0 %) Delay = 2136.77 ps ( 19.1 %)
Path 0 -- 18 : 0 2 pi A = 0.00 Df = 0.0 -0.0 ps S = 0.0 ps Cin = 0.0 ff Cout = 46.5 ff Cmax = 0.0 ff G = 0
Path 1 -- 50 : 2 2 XOR2X1 A = 0.00 Df = 155.4 -28.0 ps S = 143.5 ps Cin = 31.9 ff Cout = 42.2 ff Cmax = 484.4 ff G = 131
Path 2 -- 53 : 3 1 AND3X1 A = 100.00 Df = 232.5 -10.0 ps S = 40.6 ps Cin = 12.6 ff Cout = 9.3 ff Cmax = 505.5 ff G = 74
Path 3 -- 54 : 1 1 INVX1 A = 0.00 Df = 275.6 -6.5 ps S = 41.2 ps Cin = 9.3 ff Cout = 12.9 ff Cmax = 503.8 ff G = 138
Path 4 -- 57 : 2 2 AND2X1 A = 100.00 Df = 399.7 -6.5 ps S = 106.1 ps Cin = 12.6 ff Cout = 39.0 ff Cmax = 505.5 ff G = 308
Path 5 -- 60 : 1 2 INVX1 A = 0.00 Df = 483.8 -15.7 ps S = 78.7 ps Cin = 9.3 ff Cout = 25.4 ff Cmax = 503.8 ff G = 271
Path 6 -- 72 : 3 1 AND3X1 A = 100.00 Df = 557.0 -2.1 ps S = 42.1 ps Cin = 12.6 ff Cout = 9.3 ff Cmax = 505.5 ff G = 74
Path 7 -- 73 : 1 1 INVX1 A = 0.00 Df = 601.2 -1.9 ps S = 40.7 ps Cin = 9.3 ff Cout = 12.5 ff Cmax = 503.8 ff G = 132
Path 8 -- 74 : 3 2 AND3X1 A = 100.00 Df = 725.3 -2.0 ps S = 106.1 ps Cin = 12.6 ff Cout = 39.0 ff Cmax = 505.5 ff G = 310
Path 9 -- 77 : 1 3 INVX1 A = 0.00 Df = 828.6 -9.3 ps S = 106.3 ps Cin = 9.3 ff Cout = 38.0 ff Cmax = 503.8 ff G = 404
Path 10 -- 111 : 4 1 AND4X1 A = 100.00 Df = 911.3 -7.7 ps S = 43.7 ps Cin = 12.5 ff Cout = 9.3 ff Cmax = 505.5 ff G = 74
Path 11 -- 112 : 1 1 INVX1 A = 0.00 Df = 957.8 -11.8 ps S = 40.9 ps Cin = 9.3 ff Cout = 12.5 ff Cmax = 503.8 ff G = 132
Path 12 -- 113 : 4 2 AND4X1 A = 100.00 Df =1081.8 -11.7 ps S = 106.1 ps Cin = 12.5 ff Cout = 39.0 ff Cmax = 505.5 ff G = 311
Path 13 -- 116 : 1 3 INVX1 A = 0.00 Df =1175.8 -0.4 ps S = 106.3 ps Cin = 9.3 ff Cout = 38.0 ff Cmax = 503.8 ff G = 404
Path 14 -- 150 : 4 1 AND4X1 A = 100.00 Df =1267.8 -17.5 ps S = 43.7 ps Cin = 12.5 ff Cout = 9.3 ff Cmax = 505.5 ff G = 74
Path 15 -- 151 : 1 1 INVX1 A = 0.00 Df =1314.3 -21.5 ps S = 40.9 ps Cin = 9.3 ff Cout = 12.5 ff Cmax = 503.8 ff G = 132
Path 16 -- 152 : 4 2 AND4X1 A = 100.00 Df =1438.4 -21.4 ps S = 106.1 ps Cin = 12.5 ff Cout = 39.0 ff Cmax = 505.5 ff G = 311
Path 17 -- 155 : 1 2 INVX1 A = 0.00 Df =1513.2 -12.3 ps S = 78.7 ps Cin = 9.3 ff Cout = 25.4 ff Cmax = 503.8 ff G = 271
Path 18 -- 167 : 3 1 AND3X1 A = 100.00 Df =1600.1 -25.9 ps S = 42.1 ps Cin = 12.6 ff Cout = 9.3 ff Cmax = 505.5 ff G = 74
Path 19 -- 168 : 1 1 INVX1 A = 0.00 Df =1646.4 -29.9 ps S = 40.7 ps Cin = 9.3 ff Cout = 12.5 ff Cmax = 503.8 ff G = 132
Path 20 -- 169 : 3 2 AND3X1 A = 100.00 Df =1770.4 -29.9 ps S = 106.1 ps Cin = 12.6 ff Cout = 39.0 ff Cmax = 505.5 ff G = 310
Path 21 -- 172 : 1 1 INVX1 A = 0.00 Df =1820.8 -19.6 ps S = 55.5 ps Cin = 9.3 ff Cout = 12.9 ff Cmax = 503.8 ff G = 138
Path 22 -- 173 : 2 1 AND2X1 A = 100.00 Df =1902.5 -29.5 ps S = 40.1 ps Cin = 12.6 ff Cout = 9.3 ff Cmax = 505.5 ff G = 73
Path 23 -- 174 : 1 1 INVX1 A = 0.00 Df =1949.3 -33.2 ps S = 41.3 ps Cin = 9.3 ff Cout = 12.9 ff Cmax = 503.8 ff G = 138
Path 24 -- 177 : 2 1 AND2X1 A = 100.00 Df =2057.5 -31.1 ps S = 84.8 ps Cin = 12.6 ff Cout = 29.7 ff Cmax = 505.5 ff G = 234
Path 25 -- 179 : 2 1 XNOR2X1 A = 0.00 Df =2136.8 -0.4 ps S = 47.9 ps Cin = 31.9 ff Cout = 0.0 ff Cmax = 484.6 ff G = 0
Lmk if it's possible to improve on this.
Really quickly, what width/size are the adders you're generating?
Sorry, width=8.
Just want to update: give me a few days, this work-week is busy for me. I'll look at it when I have a break.
Perhaps this may be relevant though: someone found a bug that I wasn't running into personally. I pushed their fix upstream just now. pip install pptrees --upgrade
and see if it changes anything. I doubt it will, but it's worth a shot.
Yeah, no luck. But the critical path changes in the new version. I've written the adder by hand for width=16 and even so I can't reach the 3 levels of AND gates reported on the paper. Is there a way to get boolean equations out instead of Verilog?