OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

How to resolve large differences between placement parasitics and global route parasitics

Open antonblanchard opened this issue 2 years ago • 15 comments

I'm trying to improve the frequency of my gate level multiplier when taping out on ASAP7. The slowest path after global routing was not the slowest path before global routing, and this path stands out because of how much worse it got.

Looking into this path, net38 stands out since it got significantly worse (both in delay and slew):

STA before global routing (ie placement parasitics):

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                        100.00  100.00 v input external delay
                  0.00    0.00  100.00 v a[43] (in)
     1    2.82                           a[43] (net)
                  0.24    0.08  100.08 v input38/A (BUFx24_ASAP7_75t_R)
                 46.15   25.90  125.97 v input38/Y (BUFx24_ASAP7_75t_R)
    72  157.91                           net38 (net)                                     <-----
                 67.34   12.93  138.90 v U$$2880/A (AND2x2_ASAP7_75t_R)
                  8.33   33.18  172.08 v U$$2880/Y (AND2x2_ASAP7_75t_R)
     1    0.69                           t$10037 (net)
                  8.33    0.02  172.11 v U$$2881/B1 (AO32x2_ASAP7_75t_R)
                 27.00   36.64  208.74 v U$$2881/Y (AO32x2_ASAP7_75t_R)
     3    7.80                           sel_0$10038 (net)
                 27.00    0.01  208.75 v rebuffer562/A (BUFx12f_ASAP7_75t_R)
                 10.56   19.23  227.98 v rebuffer562/Y (BUFx12f_ASAP7_75t_R)
...

STA after global routing (ie global route parasitics):

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                        100.00  100.00 v input external delay
                  0.00    0.00  100.00 v a[43] (in)
     1    2.85                           a[43] (net)
                  0.36    0.11  100.11 v input38/A (BUFx24_ASAP7_75t_R)
                 31.38   23.16  123.28 v input38/Y (BUFx24_ASAP7_75t_R)
    72  158.75                           net38 (net)                                    <---- much worse
                286.00   90.07  213.35 v U$$2880/A (AND2x2_ASAP7_75t_R) 
                 15.29   63.44  276.79 v U$$2880/Y (AND2x2_ASAP7_75t_R)
     1    1.19                           t$10037 (net)
                 15.30    0.14  276.93 v U$$2881/B1 (AO32x2_ASAP7_75t_R)
                 31.32   39.53  316.46 v U$$2881/Y (AO32x2_ASAP7_75t_R)
     3    9.04                           sel_0$10038 (net)
                 31.55    1.53  317.99 v rebuffer562/A (BUFx12f_ASAP7_75t_R)
                 11.23   19.42  337.42 v rebuffer562/Y (BUFx12f_ASAP7_75t_R)
...

The Steiner tree for net38 looks like this. The driver (input38) is the yellow circle on the right and the load (U$$2880) is the yellow circle on the left.

rst-net38

And the 2D tree produced by fastroute:

tree2d-50route_reduction

The path from input38 to U$$2880 got much worse, which seems to explain much of the difference. Interestingly, if I reduce the global routing resource reduction specified, then the 2D tree remains similar to the Steiner tree:

-  set_global_routing_layer_adjustment $env(MIN_ROUTING_LAYER)-$env(MAX_ROUTING_LAYER) 0.5
+  set_global_routing_layer_adjustment $env(MIN_ROUTING_LAYER)-$env(MAX_ROUTING_LAYER) 0.4

tree2d-40route_reduction

And setup slack also improves. So we tried to maintain the Steiner tree layout, but ran out of routing resources and modified it significantly. Is this the multi pin maze routing in the global router?

I was trying to think of ways we can improve this (either through better parameters for this design, or changes to OR and ORFS). Some thoughts:

OR changes:

  • Global routing should resist modifying the Steiner tree layout for any net on the n% worst timing paths (both setup or hold I would think). Either route them first, or tag them so maze routing doesn't alter the Steiner tree layout.
  • Add an ECO stage to fix setup (and hold) violations after detailed routing (we've discussed the value of an ECO flow for hold elsewhere)

ORFS changes:

  • Reduce set_global_routing_layer_adjustment. The default value used by ASAP7 of 50% seems high, and I confirmed reducing this does result in the tree more or less matching the original Steiner tree, and the the setup slack for this path improves.
  • Specify a reasonable max fan out (set_max_fanout) for ASAP7 (and perhaps other platforms). The problem net has a very large fan out (72), and this issue would seem to affect larger fan out nets more. We can't rely on other parameters driving the resizer, because in it's original layout we didn't need to do fan out management. As a test, I specified a max fan out of 20 and it improved the setup slack significantly.
  • Run another pass of resizer timing fixup after global routing. This is an option on Openlane. Since we have to go through another pass of detailed placement and global routing, this could back fire (there's nothing to prevent a significant amount of the design getting rerouted). Would an incremental global route step in OR help here? We need that for an ECO flow anyway.

Design changes (ie me)

  • Increase the slack margin to overfix the design. This is a bit of a blunt hammer
  • Reduce density of my design so there are more routing resources available

@maliberty, @jjcherry56, @tspyrou any thoughts?

antonblanchard avatar Jul 06 '22 06:07 antonblanchard

"taping out on ASAP7" makes no sense as it is an academic process. Do you mean testing?

maliberty avatar Jul 06 '22 06:07 maliberty

@maliberty yeah, just producing a GDS. With sky130hd being so slow, I wanted a second PDK to test my multiplier design on.

antonblanchard avatar Jul 06 '22 06:07 antonblanchard

@antonblanchard you can try tweaking the global routing script to run another repair design after global routing. We haven't turned this on by default yet but OpenLane has.

tspyrou avatar Jul 06 '22 14:07 tspyrou

BTW @antonblanchard - Have you looked at the work Teo did to explore adders? The eventual goal was to expand that to other things like multipliers too.

mithro avatar Jul 06 '22 14:07 mithro

The key idea is to make global router more timing aware in its decision about which nets to reroute to fix congestion. I think that is a reasonable goal as the current method (FastRouteCore::StNetOrder) is based only on congestion.

Repair after detailed routing is a good goal but to make it really work we need an incremental detailed router which we don't have currently. It is a non-trivial project that we will do eventually.

Library max fanout is often larger than the optimal fanout. We have some manual controls but could better automate selection.

maliberty avatar Jul 06 '22 15:07 maliberty

@luis201420 please look at making rip-up-reroute timing aware (@eder-matheus can help).

maliberty avatar Jul 06 '22 16:07 maliberty

repair_design after global routing is pretty hopeless at this point because the global router just pretends to be incremental. It is just too slow.

The first place I would look is the RC values. I do not know if they have been correlated for this technology and that is one way to get the sort of discrepancy you are seeing. It looks like the resistance used in the placement based parasitics is too low. Notice that the capacitances are very close. But the slews are very different. That is a resistance issue.

repair_design does not make any attempt to address timing paths. It only addresses electrical rules; max slew, max cap and fanout and brings the design into reasonable slew values across the board. So looking at timing paths is sort of pointless. Of course they are going to differ. Timing paths are addressed with repair_timing, it it should break up the net so the path is not so slow (assuming it is limiting timing). It that is not happening it is an issue, but it has nothing to do with placement based timing. So the real issue is are there max slew violations that still exist after global and/or detailed routing. Over correcting when repairing slews is not such a horrible thing because you really don't want to be anywhere near max slew because of how much power those nets will burn up.

jjcherry56 avatar Jul 06 '22 16:07 jjcherry56

The first place I would look is the RC values. I do not know if they have been correlated for this technology and that is one way to get the sort of discrepancy you are seeing. It looks like the resistance used in the placement based parasitics is too low. Notice that the capacitances are very close. But the slews are very different. That is a resistance issue.

I updated the RC values at https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/524

antonblanchard avatar Jul 19 '22 22:07 antonblanchard

I've revisited this issue after fixing the RC layer estimates. Working with OpenROAD git SHA1 d4c41f0b5d1627e0c6bad8362006a51134011fc9.

This path just makes timing (with ~10ps slack) when looking at estimate_parasitics -placement:

Startpoint: a[7] (input port clocked by clk)
Endpoint: _574_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 v input external delay
                  0.00    0.00   50.00 v a[7] (in)
     1    2.90                           a[7] (net)
                  0.29    0.09   50.09 v input62/A (BUFx24_ASAP7_75t_R)
                  9.95   16.45   66.54 v input62/Y (BUFx24_ASAP7_75t_R)
    20   47.62                           net62 (net)                        <--- look here
                100.03   30.50   97.04 v U$$414/A (AND2x4_ASAP7_75t_R)
                  9.17   38.02  135.06 v U$$414/Y (AND2x4_ASAP7_75t_R)
     1    1.27                           t$8795 (net)
                  9.17    0.11  135.18 v U$$415/B1 (AO32x2_ASAP7_75t_R)
                 15.55   29.42  164.60 v U$$415/Y (AO32x2_ASAP7_75t_R)
     1    2.87                           sel_0$8796 (net)
                 15.55    0.12  164.72 v rebuffer415/A (BUFx12f_ASAP7_75t_R)
                 13.28   17.25  181.97 v rebuffer415/Y (BUFx12f_ASAP7_75t_R)
     8   17.95                           net967 (net)
                 24.82    6.79  188.77 v U$$517/A2 (AO22x2_ASAP7_75t_R)
                 10.98   23.35  212.12 v U$$517/Y (AO22x2_ASAP7_75t_R)
     1    1.68                           t$8848 (net)
                 10.98    0.01  212.13 v U$$518/A (XOR2x1_ASAP7_75t_R)
                 38.37   23.31  235.45 v U$$518/Y (XOR2x1_ASAP7_75t_R)
     1    2.07                           booth_b6_m50 (net)
                 38.37    0.02  235.47 v dadda_fa_0_56_1/A (FAx1_ASAP7_75t_R)
                 28.62   43.28  278.75 v dadda_fa_0_56_1/SN (FAx1_ASAP7_75t_R)
     1    0.95                           sn$14 (net)
                 28.62    0.04  278.79 v U$$4536/A (INVx1_ASAP7_75t_R)
                 17.86   14.39  293.18 ^ U$$4536/Y (INVx1_ASAP7_75t_R)
     1    1.78                           s$487 (net)
                 17.86    0.05  293.23 ^ dadda_fa_1_56_8/CI (FAx1_ASAP7_75t_R)
                 22.55   17.21  310.44 v dadda_fa_1_56_8/CON (FAx1_ASAP7_75t_R)
     8    0.92                           con$483 (net)
                 22.55    0.03  310.47 v U$$4993/A (INVx1_ASAP7_75t_R)
                 18.30   14.29  324.76 ^ U$$4993/Y (INVx1_ASAP7_75t_R)
     1    2.13                           c$1859 (net)
                 18.30    0.05  324.80 ^ dadda_fa_2_57_3/A (FAx1_ASAP7_75t_R)
                 22.56   35.48  360.28 v dadda_fa_2_57_3/SN (FAx1_ASAP7_75t_R)
     1    0.74                           sn$1858 (net)
                 22.56    0.01  360.29 v U$$5938/A (INVx1_ASAP7_75t_R)
                 18.76   14.61  374.90 ^ U$$5938/Y (INVx1_ASAP7_75t_R)
     1    2.23                           s$3659 (net)
                 18.76    0.02  374.92 ^ dadda_fa_3_57_3/B (FAx1_ASAP7_75t_R)
                 30.56   40.32  415.23 v dadda_fa_3_57_3/SN (FAx1_ASAP7_75t_R)
     1    1.11                           sn$3657 (net)
                 30.56    0.06  415.30 v U$$6822/A (INVx1_ASAP7_75t_R)
                 18.71   15.00  430.30 ^ U$$6822/Y (INVx1_ASAP7_75t_R)
     1    1.83                           s$5306 (net)
                 18.71    0.06  430.37 ^ dadda_fa_4_57_2/CI (FAx1_ASAP7_75t_R)
                 30.84   33.85  464.22 ^ dadda_fa_4_57_2/SN (FAx1_ASAP7_75t_R)
     1    0.80                           sn$5303 (net)
                 30.84    0.01  464.23 ^ U$$7558/A (INVx1_ASAP7_75t_R)
                 16.41   13.00  477.23 v U$$7558/Y (INVx1_ASAP7_75t_R)
     1    1.82                           s$6688 (net)
                 16.41    0.06  477.29 v dadda_fa_5_57_1/CI (FAx1_ASAP7_75t_R)
                 29.59   19.27  496.56 ^ dadda_fa_5_57_1/CON (FAx1_ASAP7_75t_R)
     8    0.76                           con$6684 (net)
                 29.59    0.01  496.57 ^ U$$8147/A (INVx1_ASAP7_75t_R)
                 18.00   14.19  510.75 v U$$8147/Y (INVx1_ASAP7_75t_R)
     1    2.22                           c$7615 (net)
                 18.00    0.02  510.77 v dadda_fa_6_58_0/B (FAx1_ASAP7_75t_R)
                 30.24   39.89  550.66 v dadda_fa_6_58_0/SN (FAx1_ASAP7_75t_R)
     1    1.10                           sn$7613 (net)
                 30.24    0.06  550.72 v U$$8534/A (INVx1_ASAP7_75t_R)
                 17.68   14.41  565.13 ^ U$$8534/Y (INVx1_ASAP7_75t_R)
     1    1.68                           s$8242 (net)
                 17.68    0.02  565.15 ^ dadda_fa_7_58_0/CI (FAx1_ASAP7_75t_R)
                 31.55   33.76  598.91 ^ dadda_fa_7_58_0/SN (FAx1_ASAP7_75t_R)
     1    0.81                           sn$8239 (net)
                 31.55    0.02  598.93 ^ U$$8790/A (INVx1_ASAP7_75t_R)
                 10.41    8.61  607.53 v U$$8790/Y (INVx1_ASAP7_75t_R)
     1    0.74                           s$10982 (net)
                 10.41    0.01  607.55 v _574_/D (DFFLQNx2_ASAP7_75t_R)
                                607.55   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    5.73                           clk (net)
                  5.96    1.88  501.88 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
                 10.74   18.78  520.66 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
     2    3.52                           clknet_0_clk (net)
                 10.75    0.18  520.84 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
                 22.42   25.50  546.34 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
     5   10.46                           clknet_1_1__leaf_clk (net)
                 22.52    0.89  547.23 ^ clkbuf_leaf_4_clk/A (BUFx4_ASAP7_75t_R)
                111.67   65.71  612.93 ^ clkbuf_leaf_4_clk/Y (BUFx4_ASAP7_75t_R)
    30   66.27                           clknet_leaf_4_clk (net)
                112.11    4.03  616.97 ^ net726_198/A (INVx3_ASAP7_75t_R)
                 14.83    7.58  624.54 v net726_198/Y (INVx3_ASAP7_75t_R)
     1    0.70                           net750 (net)
                 14.83    0.02  624.56 v _574_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  624.56   clock reconvergence pessimism
                         -6.93  617.64   library setup time
                                617.64   data required time
-----------------------------------------------------------------------------
                                617.64   data required time
                               -607.55   data arrival time
-----------------------------------------------------------------------------
                                 10.09   slack (MET)

After global routing (ie estimate_parasitics -global_routing), the same path now has -57ps slack:

Startpoint: a[7] (input port clocked by clk)
Endpoint: _574_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 v input external delay
                  0.00    0.00   50.00 v a[7] (in)
     1    2.91                           a[7] (net)
                  0.42    0.13   50.13 v input62/A (BUFx24_ASAP7_75t_R)
                  8.62   15.99   66.13 v input62/Y (BUFx24_ASAP7_75t_R)
    20   47.24                           net62 (net)                       <---- look here
                209.69   65.71  131.84 v U$$414/A (AND2x4_ASAP7_75t_R)
                 12.35   52.64  184.48 v U$$414/Y (AND2x4_ASAP7_75t_R)
     1    1.42                           t$8795 (net)
                 12.35    0.17  184.65 v U$$415/B1 (AO32x2_ASAP7_75t_R)
                 15.26   30.34  214.99 v U$$415/Y (AO32x2_ASAP7_75t_R)
     1    2.91                           sel_0$8796 (net)
                 15.27    0.25  215.24 v rebuffer415/A (BUFx12f_ASAP7_75t_R)
                 13.79   17.07  232.31 v rebuffer415/Y (BUFx12f_ASAP7_75t_R)
     8   18.13                           net967 (net)
                 24.96    6.82  239.13 v U$$517/A2 (AO22x2_ASAP7_75t_R)
                 11.94   23.45  262.58 v U$$517/Y (AO22x2_ASAP7_75t_R)
     1    1.74                           t$8848 (net)
                 11.94    0.10  262.68 v U$$518/A (XOR2x1_ASAP7_75t_R)
                 50.29   23.66  286.34 v U$$518/Y (XOR2x1_ASAP7_75t_R)
     1    2.13                           booth_b6_m50 (net)
                 50.29    0.09  286.43 v dadda_fa_0_56_1/A (FAx1_ASAP7_75t_R)
                 30.74   46.03  332.46 v dadda_fa_0_56_1/SN (FAx1_ASAP7_75t_R)
     1    0.97                           sn$14 (net)
                 30.74    0.11  332.57 v U$$4536/A (INVx1_ASAP7_75t_R)
                 19.27   15.29  347.86 ^ U$$4536/Y (INVx1_ASAP7_75t_R)
     1    1.91                           s$487 (net)
                 19.27    0.14  348.00 ^ dadda_fa_1_56_8/CI (FAx1_ASAP7_75t_R)
                 23.52   18.07  366.07 v dadda_fa_1_56_8/CON (FAx1_ASAP7_75t_R)
     8    1.01                           con$483 (net)
                 23.52    0.07  366.14 v U$$4993/A (INVx1_ASAP7_75t_R)
                 19.12   14.76  380.89 ^ U$$4993/Y (INVx1_ASAP7_75t_R)
     1    2.22                           c$1859 (net)
                 19.13    0.16  381.05 ^ dadda_fa_2_57_3/A (FAx1_ASAP7_75t_R)
                 22.84   35.87  416.92 v dadda_fa_2_57_3/SN (FAx1_ASAP7_75t_R)
     1    0.76                           sn$1858 (net)
                 22.84    0.03  416.95 v U$$5938/A (INVx1_ASAP7_75t_R)
                 18.79   14.62  431.57 ^ U$$5938/Y (INVx1_ASAP7_75t_R)
     1    2.21                           s$3659 (net)
                 18.79    0.05  431.62 ^ dadda_fa_3_57_3/B (FAx1_ASAP7_75t_R)
                 33.25   42.00  473.62 v dadda_fa_3_57_3/SN (FAx1_ASAP7_75t_R)
     1    1.23                           sn$3657 (net)
                 33.25    0.13  473.74 v U$$6822/A (INVx1_ASAP7_75t_R)
                 19.66   15.67  489.41 ^ U$$6822/Y (INVx1_ASAP7_75t_R)
     1    1.87                           s$5306 (net)
                 19.67    0.14  489.56 ^ dadda_fa_4_57_2/CI (FAx1_ASAP7_75t_R)
                 32.01   34.96  524.52 ^ dadda_fa_4_57_2/SN (FAx1_ASAP7_75t_R)
     1    0.86                           sn$5303 (net)
                 32.01    0.05  524.57 ^ U$$7558/A (INVx1_ASAP7_75t_R)
                 17.35   13.53  538.10 v U$$7558/Y (INVx1_ASAP7_75t_R)
     1    1.93                           s$6688 (net)
                 17.35    0.15  538.25 v dadda_fa_5_57_1/CI (FAx1_ASAP7_75t_R)
                 29.74   19.54  557.80 ^ dadda_fa_5_57_1/CON (FAx1_ASAP7_75t_R)
     8    0.76                           con$6684 (net)
                 29.74    0.03  557.82 ^ U$$8147/A (INVx1_ASAP7_75t_R)
                 18.35   14.36  572.19 v U$$8147/Y (INVx1_ASAP7_75t_R)
     1    2.28                           c$7615 (net)
                 18.35    0.08  572.27 v dadda_fa_6_58_0/B (FAx1_ASAP7_75t_R)
                 32.85   41.61  613.88 v dadda_fa_6_58_0/SN (FAx1_ASAP7_75t_R)
     1    1.21                           sn$7613 (net)
                 32.86    0.13  614.01 v U$$8534/A (INVx1_ASAP7_75t_R)
                 18.60   15.08  629.09 ^ U$$8534/Y (INVx1_ASAP7_75t_R)
     1    1.72                           s$8242 (net)
                 18.60    0.06  629.15 ^ dadda_fa_7_58_0/CI (FAx1_ASAP7_75t_R)
                 32.34   34.45  663.60 ^ dadda_fa_7_58_0/SN (FAx1_ASAP7_75t_R)
     1    0.84                           sn$8239 (net)
                 32.34    0.06  663.66 ^ U$$8790/A (INVx1_ASAP7_75t_R)
                 10.68    8.76  672.42 v U$$8790/Y (INVx1_ASAP7_75t_R)
     1    0.75                           s$10982 (net)
                 10.68    0.05  672.47 v _574_/D (DFFLQNx2_ASAP7_75t_R)
                                672.47   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    5.64                           clk (net)
                  3.64    1.15  501.15 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
                 11.41   18.37  519.52 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
     2    3.91                           clknet_0_clk (net)
                 11.43    0.26  519.78 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
                 23.27   25.38  545.16 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
     5   10.65                           clknet_1_1__leaf_clk (net)
                 23.59    1.57  546.73 ^ clkbuf_leaf_4_clk/A (BUFx4_ASAP7_75t_R)
                114.13   62.96  609.69 ^ clkbuf_leaf_4_clk/Y (BUFx4_ASAP7_75t_R)
    30   65.92                           clknet_leaf_4_clk (net)
                114.64    4.51  614.20 ^ net726_198/A (INVx3_ASAP7_75t_R)
                 15.18    7.76  621.96 v net726_198/Y (INVx3_ASAP7_75t_R)
     1    0.75                           net750 (net)
                 15.18    0.05  622.01 v _574_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  622.01   clock reconvergence pessimism
                         -6.88  615.13   library setup time
                                615.13   data required time
-----------------------------------------------------------------------------
                                615.13   data required time
                               -672.47   data arrival time
-----------------------------------------------------------------------------
                                -57.34   slack (VIOLATED)

A lot of the problem is net62 which is a high fan out net (20). Looking at the steiner graph (input62 is bottom right, U$$414 is bottom left):

net62

And the 2D global route layout:

net62-2d

Similar problem to the previous one, we've made a modification from the steiner graph layout in a way that is pretty detrimental to a critical net.

I hacked GRT to not rip up and reroute net62 via this patch:

diff --git a/src/grt/src/fastroute/src/RipUp.cpp b/src/grt/src/fastroute/src/RipUp.cpp
index 4159a638e..8d8903e57 100644
--- a/src/grt/src/fastroute/src/RipUp.cpp
+++ b/src/grt/src/fastroute/src/RipUp.cpp
@@ -48,6 +48,11 @@ void FastRouteCore::ripupSegL(const Segment* seg)
   const int ymin = std::min(seg->y1, seg->y2);
   const int ymax = std::max(seg->y1, seg->y2);
 
+#if 1
+  if (!strcmp("net62", netName(nets_[seg->netID])))
+    return;
+#endif
+
   // remove L routing
   if (seg->xFirst) {
     for (int i = seg->x1; i < seg->x2; i++)
@@ -451,6 +456,11 @@ void FastRouteCore::newRipupNet(const int netID)
   const TreeNode* treenodes = sttrees_[netID].nodes;
   const int deg = sttrees_[netID].deg;
 
+#if 1
+  if (!strcmp("net62", netName(nets_[netID])))
+    return;
+#endif
+
   for (int edgeID = 0; edgeID < 2 * deg - 3; edgeID++) {
     const TreeEdge* treeedge = &(treeedges[edgeID]);
     if (treeedge->len > 0) {

And it fixed the issue with net62:

Startpoint: a[7] (input port clocked by clk)
Endpoint: _574_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 v input external delay
                  0.00    0.00   50.00 v a[7] (in)
     1    2.91                           a[7] (net)
                  0.42    0.13   50.13 v input62/A (BUFx24_ASAP7_75t_R)
                 10.27   16.62   66.75 v input62/Y (BUFx24_ASAP7_75t_R)
    20   49.63                           net62 (net)                 <---- much better
                 90.99   27.36   94.11 v U$$414/A (AND2x4_ASAP7_75t_R)
                  9.06   36.80  130.91 v U$$414/Y (AND2x4_ASAP7_75t_R)
     1    1.42                           t$8795 (net)
                  9.06    0.17  131.08 v U$$415/B1 (AO32x2_ASAP7_75t_R)
                 15.23   29.41  160.49 v U$$415/Y (AO32x2_ASAP7_75t_R)
     1    2.91                           sel_0$8796 (net)
                 15.24    0.25  160.74 v rebuffer415/A (BUFx12f_ASAP7_75t_R)
                 13.89   17.08  177.82 v rebuffer415/Y (BUFx12f_ASAP7_75t_R)
     8   18.16                           net967 (net)
                 24.65    6.70  184.52 v U$$517/A2 (AO22x2_ASAP7_75t_R)
                 12.09   23.39  207.91 v U$$517/Y (AO22x2_ASAP7_75t_R)
     1    1.74                           t$8848 (net)
                 12.09    0.10  208.00 v U$$518/A (XOR2x1_ASAP7_75t_R)
                 39.51   23.70  231.70 v U$$518/Y (XOR2x1_ASAP7_75t_R)
     1    2.13                           booth_b6_m50 (net)
                 39.51    0.09  231.79 v dadda_fa_0_56_1/A (FAx1_ASAP7_75t_R)
                 29.16   43.86  275.66 v dadda_fa_0_56_1/SN (FAx1_ASAP7_75t_R)
     1    0.97                           sn$14 (net)
                 29.16    0.11  275.77 v U$$4536/A (INVx1_ASAP7_75t_R)
                 18.87   14.95  290.71 ^ U$$4536/Y (INVx1_ASAP7_75t_R)
     1    1.91                           s$487 (net)
                 18.87    0.14  290.85 ^ dadda_fa_1_56_8/CI (FAx1_ASAP7_75t_R)
                 23.50   17.95  308.80 v dadda_fa_1_56_8/CON (FAx1_ASAP7_75t_R)
     8    1.01                           con$483 (net)
                 23.51    0.07  308.87 v U$$4993/A (INVx1_ASAP7_75t_R)
                 19.12   14.75  323.62 ^ U$$4993/Y (INVx1_ASAP7_75t_R)
     1    2.22                           c$1859 (net)
                 19.12    0.16  323.78 ^ dadda_fa_2_57_3/A (FAx1_ASAP7_75t_R)
                 22.84   35.87  359.65 v dadda_fa_2_57_3/SN (FAx1_ASAP7_75t_R)
     1    0.76                           sn$1858 (net)
                 22.84    0.03  359.68 v U$$5938/A (INVx1_ASAP7_75t_R)
                 18.79   14.62  374.30 ^ U$$5938/Y (INVx1_ASAP7_75t_R)
     1    2.21                           s$3659 (net)
                 18.79    0.05  374.34 ^ dadda_fa_3_57_3/B (FAx1_ASAP7_75t_R)
                 33.25   42.00  416.35 v dadda_fa_3_57_3/SN (FAx1_ASAP7_75t_R)
     1    1.23                           sn$3657 (net)
                 33.25    0.13  416.47 v U$$6822/A (INVx1_ASAP7_75t_R)
                 19.66   15.67  432.14 ^ U$$6822/Y (INVx1_ASAP7_75t_R)
     1    1.87                           s$5306 (net)
                 19.66    0.14  432.28 ^ dadda_fa_4_57_2/CI (FAx1_ASAP7_75t_R)
                 32.01   34.96  467.24 ^ dadda_fa_4_57_2/SN (FAx1_ASAP7_75t_R)
     1    0.86                           sn$5303 (net)
                 32.02    0.05  467.30 ^ U$$7558/A (INVx1_ASAP7_75t_R)
                 17.35   13.53  480.83 v U$$7558/Y (INVx1_ASAP7_75t_R)
     1    1.93                           s$6688 (net)
                 17.35    0.15  480.98 v dadda_fa_5_57_1/CI (FAx1_ASAP7_75t_R)
                 29.74   19.54  500.52 ^ dadda_fa_5_57_1/CON (FAx1_ASAP7_75t_R)
     8    0.76                           con$6684 (net)
                 29.74    0.03  500.55 ^ U$$8147/A (INVx1_ASAP7_75t_R)
                 18.35   14.36  514.91 v U$$8147/Y (INVx1_ASAP7_75t_R)
     1    2.28                           c$7615 (net)
                 18.35    0.08  515.00 v dadda_fa_6_58_0/B (FAx1_ASAP7_75t_R)
                 32.16   41.12  556.11 v dadda_fa_6_58_0/SN (FAx1_ASAP7_75t_R)
     1    1.18                           sn$7613 (net)
                 32.16    0.15  556.26 v U$$8534/A (INVx1_ASAP7_75t_R)
                 18.43   14.94  571.20 ^ U$$8534/Y (INVx1_ASAP7_75t_R)
     1    1.72                           s$8242 (net)
                 18.43    0.06  571.26 ^ dadda_fa_7_58_0/CI (FAx1_ASAP7_75t_R)
                 32.34   34.40  605.66 ^ dadda_fa_7_58_0/SN (FAx1_ASAP7_75t_R)
     1    0.84                           sn$8239 (net)
                 32.34    0.06  605.72 ^ U$$8790/A (INVx1_ASAP7_75t_R)
                 10.68    8.76  614.48 v U$$8790/Y (INVx1_ASAP7_75t_R)
     1    0.75                           s$10982 (net)
                 10.68    0.05  614.53 v _574_/D (DFFLQNx2_ASAP7_75t_R)
                                614.53   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    5.67                           clk (net)
                  3.76    1.19  501.19 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
                 11.00   18.21  519.39 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
     2    3.67                           clknet_0_clk (net)
                 11.03    0.28  519.67 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
                 23.05   25.50  545.17 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
     5   10.70                           clknet_1_1__leaf_clk (net)
                 23.35    1.47  546.64 ^ clkbuf_leaf_4_clk/A (BUFx4_ASAP7_75t_R)
                114.01   62.55  609.19 ^ clkbuf_leaf_4_clk/Y (BUFx4_ASAP7_75t_R)
    30   65.92                           clknet_leaf_4_clk (net)
                114.58    4.75  613.94 ^ net726_198/A (INVx3_ASAP7_75t_R)
                 15.18    7.76  621.70 v net726_198/Y (INVx3_ASAP7_75t_R)
     1    0.75                           net750 (net)
                 15.18    0.05  621.75 v _574_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  621.75   clock reconvergence pessimism
                         -6.88  614.87   library setup time
                                614.87   data required time
-----------------------------------------------------------------------------
                                614.87   data required time
                               -614.53   data arrival time
-----------------------------------------------------------------------------
                                  0.34   slack (MET)

Test case: grt-ripup-large-net.tar.gz (see doit.tcl)

antonblanchard avatar Jul 21 '22 04:07 antonblanchard

@eder-matheus @luis201420 please use this as a test case for your timing driven ripup in grt. Can we achieve a similar result with the net specific hacking?

maliberty avatar Jul 21 '22 04:07 maliberty

I ran the design all the way through detailed routing, before and after the net specific hack. I noticed one of the modifications to the net was tapping off at the mid point of a wire, and wondered if estimate_parasitics -global_routing could give pessimistic results in that case.

Anyway, the issue is still there after DRT, if a little less prononounced:

Startpoint: a[7] (input port clocked by clk)
Endpoint: _574_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 v input external delay
                  0.00    0.00   50.00 v a[7] (in)
     1    2.87                           a[7] (net)
                  0.44    0.14   50.14 v input62/A (BUFx24_ASAP7_75t_R)
                  8.54   15.92   66.06 v input62/Y (BUFx24_ASAP7_75t_R)
    20   50.09                           net62 (net)                          <------ here
                149.80   46.60  112.66 v U$$414/A (AND2x4_ASAP7_75t_R)
                 11.06   45.89  158.55 v U$$414/Y (AND2x4_ASAP7_75t_R)
     1    1.75                           t$8795 (net)
                 11.07    0.24  158.79 v U$$415/B1 (AO32x2_ASAP7_75t_R)
                 14.33   29.01  187.80 v U$$415/Y (AO32x2_ASAP7_75t_R)
     1    2.99                           sel_0$8796 (net)
                 14.34    0.24  188.04 v rebuffer415/A (BUFx12f_ASAP7_75t_R)
                 12.72   16.22  204.26 v rebuffer415/Y (BUFx12f_ASAP7_75t_R)
     8   20.37                           net967 (net)
                 32.24    9.55  213.80 v U$$517/A2 (AO22x2_ASAP7_75t_R)
                 12.91   24.46  238.26 v U$$517/Y (AO22x2_ASAP7_75t_R)
     1    1.69                           t$8848 (net)
                 12.91    0.07  238.34 v U$$518/A (XOR2x1_ASAP7_75t_R)
                 41.48   23.29  261.63 v U$$518/Y (XOR2x1_ASAP7_75t_R)
     1    2.04                           booth_b6_m50 (net)
                 41.48    0.07  261.69 v dadda_fa_0_56_1/A (FAx1_ASAP7_75t_R)
                 27.68   42.19  303.88 v dadda_fa_0_56_1/SN (FAx1_ASAP7_75t_R)
     1    1.00                           sn$14 (net)
                 27.68    0.07  303.96 v U$$4536/A (INVx1_ASAP7_75t_R)
                 16.50   13.46  317.42 ^ U$$4536/Y (INVx1_ASAP7_75t_R)
     1    1.90                           s$487 (net)
                 16.51    0.12  317.54 ^ dadda_fa_1_56_8/CI (FAx1_ASAP7_75t_R)
                 22.34   16.76  334.30 v dadda_fa_1_56_8/CON (FAx1_ASAP7_75t_R)
     8    1.09                           con$483 (net)
                 22.34    0.07  334.36 v U$$4993/A (INVx1_ASAP7_75t_R)
                 16.91   13.33  347.69 ^ U$$4993/Y (INVx1_ASAP7_75t_R)
     1    2.09                           c$1859 (net)
                 16.92    0.14  347.83 ^ dadda_fa_2_57_3/A (FAx1_ASAP7_75t_R)
                 20.20   32.57  380.40 v dadda_fa_2_57_3/SN (FAx1_ASAP7_75t_R)
     1    0.75                           sn$1858 (net)
                 20.20    0.02  380.42 v U$$5938/A (INVx1_ASAP7_75t_R)
                 16.30   12.89  393.31 ^ U$$5938/Y (INVx1_ASAP7_75t_R)
     1    2.23                           s$3659 (net)
                 16.30    0.07  393.38 ^ dadda_fa_3_57_3/B (FAx1_ASAP7_75t_R)
                 32.26   40.58  433.97 v dadda_fa_3_57_3/SN (FAx1_ASAP7_75t_R)
     1    1.36                           sn$3657 (net)
                 32.26    0.13  434.09 v U$$6822/A (INVx1_ASAP7_75t_R)
                 18.38   14.78  448.87 ^ U$$6822/Y (INVx1_ASAP7_75t_R)
     1    2.02                           s$5306 (net)
                 18.39    0.20  449.07 ^ dadda_fa_4_57_2/CI (FAx1_ASAP7_75t_R)
                 27.97   31.63  480.71 ^ dadda_fa_4_57_2/SN (FAx1_ASAP7_75t_R)
     1    0.82                           sn$5303 (net)
                 27.97    0.04  480.75 ^ U$$7558/A (INVx1_ASAP7_75t_R)
                 14.54   11.70  492.44 v U$$7558/Y (INVx1_ASAP7_75t_R)
     1    1.91                           s$6688 (net)
                 14.54    0.12  492.56 v dadda_fa_5_57_1/CI (FAx1_ASAP7_75t_R)
                 27.27   17.42  509.99 ^ dadda_fa_5_57_1/CON (FAx1_ASAP7_75t_R)
     8    0.73                           con$6684 (net)
                 27.27    0.01  510.00 ^ U$$8147/A (INVx1_ASAP7_75t_R)
                 15.72   12.46  522.46 v U$$8147/Y (INVx1_ASAP7_75t_R)
     1    2.21                           c$7615 (net)
                 15.72    0.07  522.53 v dadda_fa_6_58_0/B (FAx1_ASAP7_75t_R)
                 35.19   42.37  564.90 v dadda_fa_6_58_0/SN (FAx1_ASAP7_75t_R)
     1    1.49                           sn$7613 (net)
                 35.19    0.13  565.03 v U$$8534/A (INVx1_ASAP7_75t_R)
                 17.00   14.12  579.16 ^ U$$8534/Y (INVx1_ASAP7_75t_R)
     1    1.70                           s$8242 (net)
                 17.00    0.06  579.22 ^ dadda_fa_7_58_0/CI (FAx1_ASAP7_75t_R)
                 28.48   31.06  610.28 ^ dadda_fa_7_58_0/SN (FAx1_ASAP7_75t_R)
     1    0.81                           sn$8239 (net)
                 28.48    0.04  610.32 ^ U$$8790/A (INVx1_ASAP7_75t_R)
                  9.60    8.03  618.35 v U$$8790/Y (INVx1_ASAP7_75t_R)
     1    0.75                           s$10982 (net)
                  9.60    0.04  618.40 v _574_/D (DFFLQNx2_ASAP7_75t_R)
                                618.40   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    5.81                           clk (net)
                  5.06    1.60  501.60 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
                 12.30   19.14  520.73 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
     2    4.34                           clknet_0_clk (net)
                 12.34    0.41  521.14 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
                 23.90   25.71  546.85 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
     5   11.04                           clknet_1_1__leaf_clk (net)
                 24.38    1.90  548.75 ^ clkbuf_leaf_4_clk/A (BUFx4_ASAP7_75t_R)
                 94.58   53.33  602.08 ^ clkbuf_leaf_4_clk/Y (BUFx4_ASAP7_75t_R)
    30   53.44                           clknet_leaf_4_clk (net)
                 95.52    5.56  607.64 ^ net726_198/A (INVx3_ASAP7_75t_R)
                 13.45    7.41  615.05 v net726_198/Y (INVx3_ASAP7_75t_R)
     1    0.71                           net750 (net)
                 13.45    0.06  615.11 v _574_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  615.11   clock reconvergence pessimism
                         -7.13  607.98   library setup time
                                607.98   data required time
-----------------------------------------------------------------------------
                                607.98   data required time
                               -618.40   data arrival time
-----------------------------------------------------------------------------
                                -10.41   slack (VIOLATED)
Startpoint: a[7] (input port clocked by clk)
Endpoint: _574_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 v input external delay
                  0.00    0.00   50.00 v a[7] (in)
     1    2.87                           a[7] (net)
                  0.44    0.14   50.14 v input62/A (BUFx24_ASAP7_75t_R)
                  9.29   16.23   66.37 v input62/Y (BUFx24_ASAP7_75t_R)
    20   52.26                           net62 (net)                        <---- here
                 92.25   27.67   94.04 v U$$414/A (AND2x4_ASAP7_75t_R)
                  9.25   37.12  131.16 v U$$414/Y (AND2x4_ASAP7_75t_R)
     1    1.75                           t$8795 (net)
                  9.27    0.23  131.39 v U$$415/B1 (AO32x2_ASAP7_75t_R)
                 14.14   28.43  159.81 v U$$415/Y (AO32x2_ASAP7_75t_R)
     1    2.95                           sel_0$8796 (net)
                 14.16    0.24  160.05 v rebuffer415/A (BUFx12f_ASAP7_75t_R)
                 12.77   16.28  176.33 v rebuffer415/Y (BUFx12f_ASAP7_75t_R)
     8   20.44                           net967 (net)
                 31.44    9.26  185.59 v U$$517/A2 (AO22x2_ASAP7_75t_R)
                 12.87   24.35  209.94 v U$$517/Y (AO22x2_ASAP7_75t_R)
     1    1.71                           t$8848 (net)
                 12.87    0.07  210.01 v U$$518/A (XOR2x1_ASAP7_75t_R)
                 39.09   23.25  233.26 v U$$518/Y (XOR2x1_ASAP7_75t_R)
     1    2.02                           booth_b6_m50 (net)
                 39.09    0.07  233.33 v dadda_fa_0_56_1/A (FAx1_ASAP7_75t_R)
                 28.65   43.29  276.61 v dadda_fa_0_56_1/SN (FAx1_ASAP7_75t_R)
     1    1.11                           sn$14 (net)
                 28.65    0.08  276.70 v U$$4536/A (INVx1_ASAP7_75t_R)
                 16.82   13.71  290.40 ^ U$$4536/Y (INVx1_ASAP7_75t_R)
     1    1.91                           s$487 (net)
                 16.83    0.12  290.52 ^ dadda_fa_1_56_8/CI (FAx1_ASAP7_75t_R)
                 21.76   16.62  307.14 v dadda_fa_1_56_8/CON (FAx1_ASAP7_75t_R)
     8    1.04                           con$483 (net)
                 21.76    0.06  307.20 v U$$4993/A (INVx1_ASAP7_75t_R)
                 16.97   13.33  320.53 ^ U$$4993/Y (INVx1_ASAP7_75t_R)
     1    2.13                           c$1859 (net)
                 16.98    0.13  320.66 ^ dadda_fa_2_57_3/A (FAx1_ASAP7_75t_R)
                 20.37   32.78  353.44 v dadda_fa_2_57_3/SN (FAx1_ASAP7_75t_R)
     1    0.76                           sn$1858 (net)
                 20.37    0.02  353.46 v U$$5938/A (INVx1_ASAP7_75t_R)
                 16.38   12.94  366.40 ^ U$$5938/Y (INVx1_ASAP7_75t_R)
     1    2.24                           s$3659 (net)
                 16.38    0.07  366.47 ^ dadda_fa_3_57_3/B (FAx1_ASAP7_75t_R)
                 31.06   39.82  406.29 v dadda_fa_3_57_3/SN (FAx1_ASAP7_75t_R)
     1    1.30                           sn$3657 (net)
                 31.06    0.12  406.41 v U$$6822/A (INVx1_ASAP7_75t_R)
                 17.62   14.30  420.71 ^ U$$6822/Y (INVx1_ASAP7_75t_R)
     1    1.95                           s$5306 (net)
                 17.62    0.15  420.86 ^ dadda_fa_4_57_2/CI (FAx1_ASAP7_75t_R)
                 27.32   30.90  451.76 ^ dadda_fa_4_57_2/SN (FAx1_ASAP7_75t_R)
     1    0.79                           sn$5303 (net)
                 27.32    0.03  451.79 ^ U$$7558/A (INVx1_ASAP7_75t_R)
                 14.46   11.63  463.42 v U$$7558/Y (INVx1_ASAP7_75t_R)
     1    1.92                           s$6688 (net)
                 14.46    0.13  463.54 v dadda_fa_5_57_1/CI (FAx1_ASAP7_75t_R)
                 27.70   17.68  481.22 ^ dadda_fa_5_57_1/CON (FAx1_ASAP7_75t_R)
     8    0.77                           con$6684 (net)
                 27.70    0.03  481.25 ^ U$$8147/A (INVx1_ASAP7_75t_R)
                 15.83   12.54  493.79 v U$$8147/Y (INVx1_ASAP7_75t_R)
     1    2.21                           c$7615 (net)
                 15.83    0.07  493.86 v dadda_fa_6_58_0/B (FAx1_ASAP7_75t_R)
                 30.60   39.41  533.27 v dadda_fa_6_58_0/SN (FAx1_ASAP7_75t_R)
     1    1.28                           sn$7613 (net)
                 30.60    0.11  533.38 v U$$8534/A (INVx1_ASAP7_75t_R)
                 15.83   13.17  546.55 ^ U$$8534/Y (INVx1_ASAP7_75t_R)
     1    1.68                           s$8242 (net)
                 15.83    0.06  546.61 ^ dadda_fa_7_58_0/CI (FAx1_ASAP7_75t_R)
                 29.09   31.20  577.81 ^ dadda_fa_7_58_0/SN (FAx1_ASAP7_75t_R)
     1    0.84                           sn$8239 (net)
                 29.09    0.04  577.85 ^ U$$8790/A (INVx1_ASAP7_75t_R)
                  9.73    8.12  585.97 v U$$8790/Y (INVx1_ASAP7_75t_R)
     1    0.75                           s$10982 (net)
                  9.73    0.04  586.01 v _574_/D (DFFLQNx2_ASAP7_75t_R)
                                586.01   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    5.51                           clk (net)
                  4.62    1.46  501.46 ^ clkbuf_0_clk/A (BUFx4_ASAP7_75t_R)
                 11.28   18.58  520.03 ^ clkbuf_0_clk/Y (BUFx4_ASAP7_75t_R)
     2    3.81                           clknet_0_clk (net)
                 11.32    0.35  520.38 ^ clkbuf_1_1__f_clk/A (BUFx4_ASAP7_75t_R)
                 21.77   24.67  545.05 ^ clkbuf_1_1__f_clk/Y (BUFx4_ASAP7_75t_R)
     5    9.70                           clknet_1_1__leaf_clk (net)
                 22.13    1.60  546.65 ^ clkbuf_leaf_4_clk/A (BUFx4_ASAP7_75t_R)
                 94.18   52.41  599.06 ^ clkbuf_leaf_4_clk/Y (BUFx4_ASAP7_75t_R)
    30   53.18                           clknet_leaf_4_clk (net)
                 95.23    5.84  604.90 ^ net726_198/A (INVx3_ASAP7_75t_R)
                 13.29    7.25  612.15 v net726_198/Y (INVx3_ASAP7_75t_R)
     1    0.66                           net750 (net)
                 13.29    0.05  612.20 v _574_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  612.20   clock reconvergence pessimism
                         -7.17  605.03   library setup time
                                605.03   data required time
-----------------------------------------------------------------------------
                                605.03   data required time
                               -586.01   data arrival time
-----------------------------------------------------------------------------
                                 19.01   slack (MET)

antonblanchard avatar Jul 21 '22 05:07 antonblanchard

BTW @antonblanchard - Have you looked at the work Teo did to explore adders? The eventual goal was to expand that to other things like multipliers too.

@mithro I have checked it out, good stuff. I've been more focused on improving the frequency of various algorithms via improvements to OpenROAD, which should hopefully benefit Teo too.

antonblanchard avatar Jul 21 '22 05:07 antonblanchard

There is another knob you can use in this situation. The set_routing_alpha command controls the tradeoff between minimizing wire length and distance from the driver. See grt/README.md for the doc (it also applies to placement based parasitics and detailed routing). The default value is .3. I changed it to .6 and the placement based slack went to 25ps, global to 10ps. Optimizing the distance from the driver to the loads on critical nets in small geometry technologies where wire resistance becomes important is probably more beneficial than global routing optimizations. It would make sense to bump alpha on critical nets and save them through the flow.

It is always going to be important to have some -slack_margiin to prevent paths from surfacing between timing repair and detailed routing. There is always going to be imperfections in the approximations used along the way until every part of the flow is incremental (in a galaxy far, far away). I can't tell how you got the db in this testcase, but another place this could be addressed is by repair_timing to split the net. But it can't do that unless it is within its radar, which means within the slack margin.

Note that the database in the testcase was not readable in an older version of OR that I am working on to address your fanout issue. The db rev's pretty frequently and there is no backward compatibility so it is better to use DEF for the testcase to survive the transitiions.

jjcherry56 avatar Jul 21 '22 16:07 jjcherry56

The set_routing_alpha command controls the tradeoff between minimizing wire length and distance from the driver.

Thanks @jjcherry56 this does look to have an impact on some of these large nets. I'll experiment with it.

Note that the database in the testcase was not readable in an older version of OR that I am working on to address your fanout issue

Good to know, I'll make sure to attach LEF/DEFs.

antonblanchard avatar Jul 24 '22 23:07 antonblanchard

Here is another path where timing gets significantly worse from placement to global routing. A big issue in this case is a 2 pin net, sn$1367:

placement parasitics:

Startpoint: a[25] (input port clocked by clk)
[WARNING GUI-0066] Heat map "Routing Congestion" has not been populated with data.
Endpoint: _554_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 ^ input external delay
                  0.00    0.00   50.00 ^ a[25] (in)
     1    0.80                           a[25] (net)
                  0.03    0.01   50.01 ^ input18/A (BUFx3_ASAP7_75t_R)
                  9.85   12.01   62.02 ^ input18/Y (BUFx3_ASAP7_75t_R)
     1    3.23                           net18 (net)
                  9.87    0.29   62.31 ^ fanout986/A (BUFx16f_ASAP7_75t_R)
                 11.66   15.38   77.69 ^ fanout986/Y (BUFx16f_ASAP7_75t_R)
     7   15.17                           net986 (net)
                 24.66    6.89   84.58 ^ fanout981/A (BUFx16f_ASAP7_75t_R)
                  9.79   19.71  104.29 ^ fanout981/Y (BUFx16f_ASAP7_75t_R)
     6   10.38                           net981 (net)
                 10.19    1.01  105.30 ^ U$$1768/A (INVx1_ASAP7_75t_R)
                 15.81   10.72  116.02 v U$$1768/Y (INVx1_ASAP7_75t_R)
     1    2.66                           notblock$9471[0] (net)
                 15.86    0.53  116.55 v U$$1771/B3 (AO33x2_ASAP7_75t_R)
                 14.39   35.51  152.06 v U$$1771/Y (AO33x2_ASAP7_75t_R)
     1    2.71                           sel_0$9472 (net)
                 14.39    0.05  152.11 v fanout501/A (BUFx16f_ASAP7_75t_R)
                 10.68   17.83  169.94 v fanout501/Y (BUFx16f_ASAP7_75t_R)
     6   20.75                           net501 (net)
                 11.46    1.11  171.06 v fanout498/A (BUFx12f_ASAP7_75t_R)
                  8.99   14.96  186.02 v fanout498/Y (BUFx12f_ASAP7_75t_R)
     7    9.50                           net498 (net)
                  8.99    0.08  186.10 v split524/A (BUFx2_ASAP7_75t_R)
                 15.97   19.43  205.53 v split524/Y (BUFx2_ASAP7_75t_R)
     5    4.89                           net1843 (net)
                 15.97    0.10  205.63 v U$$1795/A2 (AO22x1_ASAP7_75t_R)
                 13.22   23.25  228.88 v U$$1795/Y (AO22x1_ASAP7_75t_R)
     1    1.74                           t$9485 (net)
                 13.22    0.03  228.91 v U$$1796/A (XOR2x1_ASAP7_75t_R)
                 16.28   23.42  252.33 v U$$1796/Y (XOR2x1_ASAP7_75t_R)
     1    1.82                           booth_b26_m11 (net)
                 16.29    0.06  252.39 v dadda_fa_2_37_2/CI (FAx1_ASAP7_75t_R)
                 60.32   58.78  311.16 v dadda_fa_2_37_2/SN (FAx1_ASAP7_75t_R)
     1    2.42                           sn$1367 (net)                          <---- here
                 60.33    0.43  311.59 v U$$5663/A (INVx1_ASAP7_75t_R)
                 36.21   27.66  339.26 ^ U$$5663/Y (INVx1_ASAP7_75t_R)
     1    3.61                           s$3258 (net)
                 36.25    0.71  339.96 ^ dadda_fa_3_37_3/A (FAx1_ASAP7_75t_R)
                 36.49   26.36  366.32 v dadda_fa_3_37_3/CON (FAx1_ASAP7_75t_R)
     8    2.12                           con$3256 (net)
                 60.25   26.95  393.27 ^ dadda_fa_3_37_3/SN (FAx1_ASAP7_75t_R)
     1    2.21                           sn$3257 (net)
                 60.26    0.35  393.61 ^ U$$6629/A (INVx1_ASAP7_75t_R)
                 27.27   20.88  414.49 v U$$6629/Y (INVx1_ASAP7_75t_R)
     1    2.81                           s$5006 (net)
                 27.29    0.42  414.91 v dadda_fa_4_37_2/CI (FAx1_ASAP7_75t_R)
                 41.78   28.74  443.64 ^ dadda_fa_4_37_2/CON (FAx1_ASAP7_75t_R)
     8    1.71                           con$5002 (net)
                 41.78    0.19  443.84 ^ U$$7404/A (INVx1_ASAP7_75t_R)
                 22.72   17.65  461.49 v U$$7404/Y (INVx1_ASAP7_75t_R)
     1    2.60                           c$6493 (net)
                 22.74    0.33  461.82 v dadda_fa_5_38_0/CI (FAx1_ASAP7_75t_R)
                 39.82   47.44  509.26 v dadda_fa_5_38_0/SN (FAx1_ASAP7_75t_R)
     1    1.51                           sn$6490 (net)
                 39.82    0.14  509.40 v U$$8037/A (INVx1_ASAP7_75t_R)
                 22.24   17.79  527.20 ^ U$$8037/Y (INVx1_ASAP7_75t_R)
     1    2.03                           s$7516 (net)
                 22.25    0.12  527.32 ^ dadda_fa_6_38_0/CI (FAx1_ASAP7_75t_R)
                 24.45   36.82  564.14 v dadda_fa_6_38_0/SN (FAx1_ASAP7_75t_R)
     1    0.81                           sn$7513 (net)
                 24.45    0.01  564.16 v U$$8461/A (INVx1_ASAP7_75t_R)
                 17.58   13.92  578.08 ^ U$$8461/Y (INVx1_ASAP7_75t_R)
     1    1.90                           s$8142 (net)
                 17.58    0.08  578.16 ^ dadda_fa_7_38_0/CI (FAx1_ASAP7_75t_R)
                 92.31   73.98  652.14 ^ dadda_fa_7_38_0/SN (FAx1_ASAP7_75t_R)
     1    3.69                           sn$8139 (net)
                 92.34    1.05  653.19 ^ U$$8717/A (INVx1_ASAP7_75t_R)
                 19.06   13.14  666.33 v U$$8717/Y (INVx1_ASAP7_75t_R)
     1    0.88                           s$10909 (net)
                 19.06    0.03  666.36 v _554_/D (DFFLQNx2_ASAP7_75t_R)
                                666.36   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    7.65                           clk (net)
                  9.97    3.14  503.14 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  9.08   18.30  521.44 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    4.70                           clknet_0_clk (net)
                  9.16    0.44  521.89 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
                 17.21   21.11  543.00 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
     4   13.45                           clknet_1_0__leaf_clk (net)
                 17.31    0.81  543.81 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
                 65.08   41.86  585.67 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
    30   68.65                           clknet_leaf_7_clk (net)
                 65.75    3.79  589.46 ^ _191_218/A (INVx3_ASAP7_75t_R)
                 10.76    6.88  596.34 v _191_218/Y (INVx3_ASAP7_75t_R)
     1    0.74                           net1537 (net)
                 10.76    0.02  596.36 v _554_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  596.36   clock reconvergence pessimism
                         -8.80  587.56   library setup time
                                587.56   data required time
-----------------------------------------------------------------------------
                                587.56   data required time
                               -666.36   data arrival time
-----------------------------------------------------------------------------
                                -78.80   slack (VIOLATED)

global route parasitics:

Startpoint: a[25] (input port clocked by clk)
Endpoint: _554_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk (rise edge)
                          0.00    0.00   clock network delay (propagated)
                         50.00   50.00 ^ input external delay
                  0.00    0.00   50.00 ^ a[25] (in)
     1    1.15                           a[25] (net)
                  0.26    0.08   50.08 ^ input18/A (BUFx3_ASAP7_75t_R)
                 10.10   11.98   62.06 ^ input18/Y (BUFx3_ASAP7_75t_R)
     1    3.26                           net18 (net)
                 10.23    0.63   62.70 ^ fanout986/A (BUFx16f_ASAP7_75t_R)
                 10.68   15.14   77.84 ^ fanout986/Y (BUFx16f_ASAP7_75t_R)
     7   15.59                           net986 (net)
                 31.87    9.44   87.28 ^ fanout981/A (BUFx16f_ASAP7_75t_R)
                 10.65   21.06  108.34 ^ fanout981/Y (BUFx16f_ASAP7_75t_R)
     6   10.98                           net981 (net)
                 11.94    1.90  110.25 ^ U$$1768/A (INVx1_ASAP7_75t_R)
                 28.19   15.64  125.89 v U$$1768/Y (INVx1_ASAP7_75t_R)
     1    4.87                           notblock$9471[0] (net)
                 28.70    2.09  127.98 v U$$1771/B3 (AO33x2_ASAP7_75t_R)
                 14.91   39.64  167.61 v U$$1771/Y (AO33x2_ASAP7_75t_R)
     1    2.83                           sel_0$9472 (net)
                 14.94    0.38  167.99 v fanout501/A (BUFx16f_ASAP7_75t_R)
                 13.78   18.36  186.35 v fanout501/Y (BUFx16f_ASAP7_75t_R)
     6   20.29                           net501 (net)
                 14.21    1.33  187.68 v fanout498/A (BUFx12f_ASAP7_75t_R)
                  9.38   15.44  203.12 v fanout498/Y (BUFx12f_ASAP7_75t_R)
     7    9.59                           net498 (net)
                  9.54    0.69  203.81 v split524/A (BUFx2_ASAP7_75t_R)
                 16.37   19.57  223.38 v split524/Y (BUFx2_ASAP7_75t_R)
     5    4.97                           net1843 (net)
                 16.39    0.35  223.73 v U$$1795/A2 (AO22x1_ASAP7_75t_R)
                 13.60   23.40  247.13 v U$$1795/Y (AO22x1_ASAP7_75t_R)
     1    1.77                           t$9485 (net)
                 13.61    0.09  247.22 v U$$1796/A (XOR2x1_ASAP7_75t_R)
                 16.62   23.42  270.63 v U$$1796/Y (XOR2x1_ASAP7_75t_R)
     1    1.79                           booth_b26_m11 (net)
                 16.63    0.19  270.82 v dadda_fa_2_37_2/CI (FAx1_ASAP7_75t_R)
                265.90  169.94  440.76 v dadda_fa_2_37_2/SN (FAx1_ASAP7_75t_R)
     1   11.21                           sn$1367 (net)                            <---- here
                266.60    7.09  447.85 v U$$5663/A (INVx1_ASAP7_75t_R)
                 89.87   69.19  517.04 ^ U$$5663/Y (INVx1_ASAP7_75t_R)
     1    5.88                           s$3258 (net)
                 90.03    2.13  519.17 ^ dadda_fa_3_37_3/A (FAx1_ASAP7_75t_R)
                 99.60   63.33  582.50 v dadda_fa_3_37_3/CON (FAx1_ASAP7_75t_R)
     8    7.77                           con$3256 (net)
                 61.71   37.82  620.32 ^ dadda_fa_3_37_3/SN (FAx1_ASAP7_75t_R)
     1    1.87                           sn$3257 (net)
                 61.72    0.34  620.66 ^ U$$6629/A (INVx1_ASAP7_75t_R)
                 30.03   22.44  643.10 v U$$6629/Y (INVx1_ASAP7_75t_R)
     1    3.21                           s$5006 (net)
                 30.09    0.73  643.83 v dadda_fa_4_37_2/CI (FAx1_ASAP7_75t_R)
                 57.50   35.35  679.18 ^ dadda_fa_4_37_2/CON (FAx1_ASAP7_75t_R)
     8    2.63                           con$5002 (net)
                 57.53    0.62  679.80 ^ U$$7404/A (INVx1_ASAP7_75t_R)
                 30.04   22.18  701.98 v U$$7404/Y (INVx1_ASAP7_75t_R)
     1    3.33                           c$6493 (net)
                 30.13    0.94  702.92 v dadda_fa_5_38_0/CI (FAx1_ASAP7_75t_R)
                 45.28   52.89  755.81 v dadda_fa_5_38_0/SN (FAx1_ASAP7_75t_R)
     1    1.73                           sn$6490 (net)
                 45.29    0.31  756.12 v U$$8037/A (INVx1_ASAP7_75t_R)
                 24.27   19.19  775.31 ^ U$$8037/Y (INVx1_ASAP7_75t_R)
     1    2.18                           s$7516 (net)
                 24.28    0.30  775.60 ^ dadda_fa_6_38_0/CI (FAx1_ASAP7_75t_R)
                 25.07   36.95  812.55 v dadda_fa_6_38_0/SN (FAx1_ASAP7_75t_R)
     1    0.78                           sn$7513 (net)
                 25.07    0.04  812.59 v U$$8461/A (INVx1_ASAP7_75t_R)
                 18.01   14.12  826.71 ^ U$$8461/Y (INVx1_ASAP7_75t_R)
     1    1.93                           s$8142 (net)
                 18.01    0.19  826.90 ^ dadda_fa_7_38_0/CI (FAx1_ASAP7_75t_R)
                 79.05   66.31  893.21 ^ dadda_fa_7_38_0/SN (FAx1_ASAP7_75t_R)
     1    3.13                           sn$8139 (net)
                 79.07    0.73  893.94 ^ U$$8717/A (INVx1_ASAP7_75t_R)
                 17.81   12.68  906.62 v U$$8717/Y (INVx1_ASAP7_75t_R)
     1    0.89                           s$10909 (net)
                 17.81    0.11  906.73 v _554_/D (DFFLQNx2_ASAP7_75t_R)
                                906.73   data arrival time

                        500.00  500.00   clock clk' (fall edge)
                          0.00  500.00   clock source latency
                  0.00    0.00  500.00 ^ clk (in)
     1    8.21                           clk (net)
                  8.29    2.62  502.62 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  9.78   17.95  520.57 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    5.33                           clknet_0_clk (net)
                 10.15    0.97  521.54 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
                 17.52   21.31  542.85 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
     4   13.55                           clknet_1_0__leaf_clk (net)
                 17.72    1.13  543.98 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
                 67.45   38.69  582.67 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
    30   69.05                           clknet_leaf_7_clk (net)
                 71.38    8.77  591.44 ^ _191_218/A (INVx3_ASAP7_75t_R)
                 11.38    7.08  598.52 v _191_218/Y (INVx3_ASAP7_75t_R)
     1    0.76                           net1537 (net)
                 11.38    0.09  598.61 v _554_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.00  598.61   clock reconvergence pessimism
                         -8.52  590.09   library setup time
                                590.09   data required time
-----------------------------------------------------------------------------
                                590.09   data required time
                               -906.73   data arrival time
-----------------------------------------------------------------------------
                               -316.64   slack (VIOLATED)

Steiner tree:

sn$1367_rst

Global route 2D tree:

sn$1367_tree2D

We've struggled to route this net, and unfortunately it was on a critical path. A few thoughts:

  • While we can print wire length after global and detailed routing (with report_wire_length), I see no option to print it after placement.To be accurate it should match what the resizer is using. We can use that for both placement and global route metrics (eg average increase in wire length, worst case increase in wire length etc). I had a go at augmenting report_wire_length, but the Steiner tree generation code for the resizer is in the resizer, and there are (at least) two different implementations of Steiner tree generation in global routing. I got a bit lost.
  • An estimation of how well global routing did, or perhaps a warning when very scenic routes are created would be useful. Before I looked at these specific nets, I wasn't sure how to further optimise my design. Now I'm thinking: perhaps the design is too dense, perhaps global placement could be tweaked etc.
  • Might be another test case for global routing (it's possible the issue goes back to global placement however). Should have routed this earlier, considering it was on a critical path.

Test case: bad-global-route.tar.gz

antonblanchard avatar Aug 01 '22 12:08 antonblanchard