OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

grt: use slack values to sort nets during congestion iterations

Open eder-matheus opened this issue 2 years ago • 11 comments

eder-matheus avatar Jul 30 '22 00:07 eder-matheus

I'll keep this as a draft to discuss the methodology of the new sorting. Also, will run the designs on the CI to check for improvements.

eder-matheus avatar Jul 30 '22 00:07 eder-matheus

This methodology is ok, the proof is in the results.

maliberty avatar Jul 30 '22 00:07 maliberty

I tried this on one of my designs that has global routing issues. I added some debug where we call getNetSlack(), and it only returns 0 or sta::MinMax::max(). Is there some STA initialisation we are missing?

antonblanchard avatar Aug 01 '22 22:08 antonblanchard

I tried this on one of my designs that has global routing issues. I added some debug where we call getNetSlack(), and it only returns 0 or sta::MinMax::max(). Is there some STA initialisation we are missing?

To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl

eder-matheus avatar Aug 01 '22 22:08 eder-matheus

To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl

Thanks @eder-matheus, I'm seeing reasonable values now.

antonblanchard avatar Aug 01 '22 23:08 antonblanchard

To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl

Thanks @eder-matheus, I'm seeing reasonable values now.

Great to hear that! I'm working on some experiments before merging it, but I believe it will be on the master branch tomorrow.

eder-matheus avatar Aug 01 '22 23:08 eder-matheus

While this improves some of my designs, it does nothing on others. A few things I noticed:

  • If I read it right, this only applies to multi source/multi dest nets (mazeRouteMSMD). A lot of my problems are just 2 pin nets.
  • Some designs never call into StNetOrder. Does that mean there is no congestion? Do routes closely match the Steiner tree in this case?
  • Shouldn't we sort the nets by slack at the start of global routing? We'd want to prioritise these critical nets in all stages of global routing (eg layer assignment), not just when handling congestion.
  • We should also prioritise clock nets. I went looking for how grt handles clock nets and commit dde95bea4da5519e9c11ba5f4c4aaf8d6c2f5fab seems to undo the previous code which puts non leaf clock nodes at the start of the net list:
  for (odb::dbNet* db_net : block_->getNets()) {
    Net* net = addNet(db_net);
    // add clock nets not connected to a leaf first
    if (net) {
      bool is_non_leaf_clock = isNonLeafClock(net->getDbNet());
      if (is_non_leaf_clock)
        nets.push_back(net);
    }
  }

  for (auto net_itr : db_net_map_) {
    Net* net = net_itr.second;
    bool is_non_leaf_clock = isNonLeafClock(net->getDbNet());
    if (!is_non_leaf_clock) {
      nets.push_back(net);
    }
  }
  std::sort(nets.begin(), nets.end(), nameLess);                  <--- here
  return nets;

antonblanchard avatar Aug 02 '22 05:08 antonblanchard

We could sort by clock status then by name. The by name was to make the results more stable across any db reordering.

maliberty avatar Aug 02 '22 05:08 maliberty

I think mazeRouteMSMD is used for two pin nets as well - the name is just more general.

maliberty avatar Aug 02 '22 05:08 maliberty

Layer assignment is done at the very end after the 2d routing is over. It is a different task to make that timing aware.

I'm less clear that handling timing early is important

  1. The criticality may be off (see our general gpl vs grt parasitic discrepancy)
  2. There isn't much room in pattern routing to do anything to improve timing.

maliberty avatar Aug 02 '22 06:08 maliberty

I've updated the net sorting to prefer clock status over the net names. @antonblanchard could you share some of your testcases that don't show improvements?

eder-matheus avatar Aug 03 '22 19:08 eder-matheus

I ran a few tests with this series (which I forward ported to current master). As expected, it does help when placement density is high. On designs with low placement density it wont make a difference, because there is no pressure on global routing.

The critical path in this test case improves by 30ps:

32bit_4cycle_asap7_multiplier-2.tar.gz

baseline:

==========================================================================
finish report_checks -unconstrained
--------------------------------------------------------------------------
Startpoint: _2194_ (falling edge-triggered flip-flop clocked by clk')
Endpoint: _2274_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk' (fall edge)
                          0.00    0.00   clock source latency
                  0.00    0.00    0.00 ^ clk (in)
     1    5.63                           clk (net)
                  4.72    1.49    1.49 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.55   16.27   17.75 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    4.37                           clknet_0_clk (net)
                  8.66    0.51   18.26 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
                 25.74   24.76   43.02 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
    15   26.78                           clknet_1_0__leaf_clk (net)
                 26.74    2.75   45.77 ^ clkbuf_opt_1_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.46   22.36   68.13 ^ clkbuf_opt_1_0_clk/Y (BUFx8_ASAP7_75t_R)
     1    3.54                           clknet_opt_1_0_clk (net)
                  8.99    1.03   69.16 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
                 25.76   22.55   91.71 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
    30   26.60                           clknet_leaf_7_clk (net)
                 27.93    3.96   95.67 ^ net352_162/A (INVx1_ASAP7_75t_R)
                 42.18   25.00  120.67 v net352_162/Y (INVx1_ASAP7_75t_R)
     1    7.15                           net366 (net)
                 43.14    3.51  124.18 v _2194_/CLK (DFFLQNx3_ASAP7_75t_R)
                 28.64   65.22  189.40 v _2194_/QN (DFFLQNx3_ASAP7_75t_R)
     1    5.61                           _0161_ (net)
                 29.80    3.05  192.45 v _0949_/A (INVx1_ASAP7_75t_R)
                 38.62   24.58  217.03 ^ _0949_/Y (INVx1_ASAP7_75t_R)
     1    5.46                           pp_row45_0 (net)
                 39.01    2.17  219.20 ^ dadda_fa_1_45_0/A (FAx1_ASAP7_75t_R)
                105.32   91.20  310.40 v dadda_fa_1_45_0/SN (FAx1_ASAP7_75t_R)
     1    4.53                           sn$394 (net)
                105.40    1.67  312.06 v U$$1537/A (INVx1_ASAP7_75t_R)
                 41.45   32.51  344.57 ^ U$$1537/Y (INVx1_ASAP7_75t_R)
     1    3.25                           s$849 (net)
                 41.49    0.70  345.27 ^ dadda_fa_2_45_2/A (FAx1_ASAP7_75t_R)
                 36.37   39.71  384.98 ^ dadda_fa_2_45_2/SN (FAx1_ASAP7_75t_R)
     1    1.11                           sn$848 (net)
                 36.38    0.17  385.15 ^ U$$1763/A (INVx1_ASAP7_75t_R)
                 23.76   17.83  402.98 v U$$1763/Y (INVx1_ASAP7_75t_R)
     1    3.33                           s$1283 (net)
                 23.88    0.96  403.94 v dadda_fa_3_45_1/CI (FAx1_ASAP7_75t_R)
                 31.77   21.88  425.82 ^ dadda_fa_3_45_1/CON (FAx1_ASAP7_75t_R)
     1    0.98                           con$1279 (net)
                 31.77    0.09  425.91 ^ U$$1972/A (INVx1_ASAP7_75t_R)
                 17.20   13.44  439.34 v U$$1972/Y (INVx1_ASAP7_75t_R)
     1    2.29                           c$1598 (net)
                 17.20    0.13  439.48 v dadda_fa_4_46_0/B (FAx1_ASAP7_75t_R)
                 29.87   21.70  461.18 ^ dadda_fa_4_46_0/CON (FAx1_ASAP7_75t_R)
     1    1.02                           con$1595 (net)
                 21.99   14.21  475.39 v dadda_fa_4_46_0/SN (FAx1_ASAP7_75t_R)
     1    0.89                           sn$1596 (net)
                 21.99    0.08  475.47 v U$$2115/A (INVx1_ASAP7_75t_R)
                 14.28   11.79  487.26 ^ U$$2115/Y (INVx1_ASAP7_75t_R)
     1    1.74                           s$1886 (net)
                 14.28    0.14  487.40 ^ dadda_fa_5_46_0/CI (FAx1_ASAP7_75t_R)
                 35.91   36.54  523.93 ^ dadda_fa_5_46_0/SN (FAx1_ASAP7_75t_R)
     1    1.24                           sn$1883 (net)
                 35.91    0.15  524.08 ^ U$$2237/A (INVx1_ASAP7_75t_R)
                 13.74   10.97  535.05 v U$$2237/Y (INVx1_ASAP7_75t_R)
     1    1.23                           s$2684 (net)
                 13.75    0.18  535.22 v _2274_/D (DFFLQNx1_ASAP7_75t_R)
                                535.22   data arrival time

                        330.00  330.00   clock clk' (fall edge)
                          0.00  330.00   clock source latency
                  0.00    0.00  330.00 ^ clk (in)
     1    5.42                           clk (net)
                  4.42    1.39  331.39 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.55   16.17  347.56 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    3.95                           clknet_0_clk (net)
                  8.64    0.44  348.01 ^ clkbuf_1_1__f_clk/A (BUFx8_ASAP7_75t_R)
                 22.29   22.76  370.77 ^ clkbuf_1_1__f_clk/Y (BUFx8_ASAP7_75t_R)
    12   18.73                           clknet_1_1__leaf_clk (net)
                 23.27    2.55  373.31 ^ clkbuf_leaf_11_clk/A (BUFx8_ASAP7_75t_R)
                 22.83   27.86  401.17 ^ clkbuf_leaf_11_clk/Y (BUFx8_ASAP7_75t_R)
    30   19.75                           clknet_leaf_11_clk (net)
                 23.72    2.43  403.60 ^ net252_82/A (INVx1_ASAP7_75t_R)
                  7.87    6.84  410.44 v net252_82/Y (INVx1_ASAP7_75t_R)
     1    0.48                           net286 (net)
                  7.87    0.04  410.47 v _2274_/CLK (DFFLQNx1_ASAP7_75t_R)
                          0.19  410.67   clock reconvergence pessimism
                         -8.60  402.06   library setup time
                                402.06   data required time
-----------------------------------------------------------------------------
                                402.06   data required time
                               -535.22   data arrival time
-----------------------------------------------------------------------------
                               -133.16   slack (VIOLATED)

patched:

finish report_checks -unconstrained
--------------------------------------------------------------------------
Startpoint: _2185_ (falling edge-triggered flip-flop clocked by clk')
Endpoint: _2273_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max

Fanout     Cap    Slew   Delay    Time   Description
-----------------------------------------------------------------------------
                          0.00    0.00   clock clk' (fall edge)
                          0.00    0.00   clock source latency
                  0.00    0.00    0.00 ^ clk (in)
     1    5.55                           clk (net)
                  4.87    1.54    1.54 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.54   16.31   17.84 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    4.36                           clknet_0_clk (net)
                  8.67    0.54   18.39 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
                 25.16   24.65   43.04 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
    15   26.22                           clknet_1_0__leaf_clk (net)
                 25.75    2.13   45.17 ^ clkbuf_opt_1_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.01   21.94   67.11 ^ clkbuf_opt_1_0_clk/Y (BUFx8_ASAP7_75t_R)
     1    3.02                           clknet_opt_1_0_clk (net)
                  8.19    0.66   67.77 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
                 26.17   21.67   89.44 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
    30   26.81                           clknet_leaf_7_clk (net)
                 34.07    7.43   96.86 ^ net352_171/A (INVx1_ASAP7_75t_R)
                 25.94   18.86  115.72 v net352_171/Y (INVx1_ASAP7_75t_R)
     1    3.67                           net375 (net)
                 26.05    0.99  116.71 v _2185_/CLK (DFFLQNx3_ASAP7_75t_R)
                 18.68   54.25  170.96 v _2185_/QN (DFFLQNx3_ASAP7_75t_R)
     1    1.81                           _0170_ (net)
                 18.70    0.32  171.28 v _0958_/A (INVx1_ASAP7_75t_R)
                 22.49   15.67  186.95 ^ _0958_/Y (INVx1_ASAP7_75t_R)
     1    3.40                           pp_row44_4 (net)
                 22.59    0.84  187.79 ^ dadda_fa_1_44_1/B (FAx1_ASAP7_75t_R)
                145.57  106.67  294.46 v dadda_fa_1_44_1/SN (FAx1_ASAP7_75t_R)
     1    6.24                           sn$388 (net)
                145.83    3.39  297.85 v U$$1531/A (INVx1_ASAP7_75t_R)
                 44.29   34.63  332.47 ^ U$$1531/Y (INVx1_ASAP7_75t_R)
     1    2.77                           s$836 (net)
                 44.31    0.45  332.92 ^ dadda_fa_2_44_2/A (FAx1_ASAP7_75t_R)
                 27.94   22.89  355.81 v dadda_fa_2_44_2/CON (FAx1_ASAP7_75t_R)
     1    1.34                           con$834 (net)
                 32.54   16.17  371.98 ^ dadda_fa_2_44_2/SN (FAx1_ASAP7_75t_R)
     1    0.86                           sn$835 (net)
                 32.54    0.08  372.05 ^ U$$1757/A (INVx1_ASAP7_75t_R)
                 16.21   12.74  384.80 v U$$1757/Y (INVx1_ASAP7_75t_R)
     1    2.02                           s$1273 (net)
                 16.21    0.22  385.02 v dadda_fa_3_44_1/CI (FAx1_ASAP7_75t_R)
                 28.01   18.20  403.22 ^ dadda_fa_3_44_1/CON (FAx1_ASAP7_75t_R)
     1    0.77                           con$1269 (net)
                 28.01    0.04  403.25 ^ U$$1968/A (INVx1_ASAP7_75t_R)
                 16.62   12.88  416.13 v U$$1968/Y (INVx1_ASAP7_75t_R)
     1    2.33                           c$1593 (net)
                 16.63    0.22  416.36 v dadda_fa_4_45_0/B (FAx1_ASAP7_75t_R)
                 31.82   22.68  439.03 ^ dadda_fa_4_45_0/CON (FAx1_ASAP7_75t_R)
     1    1.20                           con$1590 (net)
                 20.85   14.07  453.10 v dadda_fa_4_45_0/SN (FAx1_ASAP7_75t_R)
     1    0.80                           sn$1591 (net)
                 20.85    0.07  453.17 v U$$2113/A (INVx1_ASAP7_75t_R)
                 13.63   11.34  464.51 ^ U$$2113/Y (INVx1_ASAP7_75t_R)
     1    1.68                           s$1881 (net)
                 13.64    0.08  464.59 ^ dadda_fa_5_45_0/CI (FAx1_ASAP7_75t_R)
                 36.86   36.82  501.42 ^ dadda_fa_5_45_0/SN (FAx1_ASAP7_75t_R)
     1    1.28                           sn$1878 (net)
                 36.86    0.18  501.60 ^ U$$2235/A (INVx1_ASAP7_75t_R)
                 10.61    8.56  510.15 v U$$2235/Y (INVx1_ASAP7_75t_R)
     1    0.70                           s$2682 (net)
                 10.61    0.05  510.20 v _2273_/D (DFFLQNx2_ASAP7_75t_R)
                                510.20   data arrival time

                        330.00  330.00   clock clk' (fall edge)
                          0.00  330.00   clock source latency
                  0.00    0.00  330.00 ^ clk (in)
     1    5.34                           clk (net)
                  4.54    1.43  331.43 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
                  8.54   16.21  347.64 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
     2    3.94                           clknet_0_clk (net)
                  8.67    0.54  348.18 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
                 25.16   24.65  372.83 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
    15   23.11                           clknet_1_0__leaf_clk (net)
                 26.21    2.76  375.59 ^ clkbuf_leaf_9_clk/A (BUFx8_ASAP7_75t_R)
                 24.31   28.48  404.07 ^ clkbuf_leaf_9_clk/Y (BUFx8_ASAP7_75t_R)
    30   20.73                           clknet_leaf_9_clk (net)
                 26.61    3.95  408.02 ^ net252_83/A (INVx1_ASAP7_75t_R)
                  8.23    7.02  415.04 v net252_83/Y (INVx1_ASAP7_75t_R)
     1    0.47                           net287 (net)
                  8.23    0.04  415.07 v _2273_/CLK (DFFLQNx2_ASAP7_75t_R)
                          0.21  415.28   clock reconvergence pessimism
                         -8.37  406.91   library setup time
                                406.91   data required time
-----------------------------------------------------------------------------
                                406.91   data required time
                               -510.20   data arrival time
-----------------------------------------------------------------------------
                               -103.29   slack (VIOLATED)

antonblanchard avatar Sep 27 '22 02:09 antonblanchard

The only question that remains for me is if -critical_nets_percentage should be changed to take a percentage (as it suggests). The option right now wants a 0-1 value.

There is an existing option to compare to: -capacities_perturbation_percentage 50

antonblanchard avatar Sep 27 '22 02:09 antonblanchard

The only question that remains for me is if -critical_nets_percentage should be changed to take a percentage (as it suggests). The option right now wants a 0-1 value.

There is an existing option to compare to: -capacities_perturbation_percentage 50

Thanks a lot for your feedback, @antonblanchard! Your results seem really good. I've updated the branch with the latest master branch and also updated the option to take a percentage, as you suggested (which makes more sense).

I'll also start a secure-ci run to make sure it doesn't break anything.

eder-matheus avatar Sep 27 '22 15:09 eder-matheus

I think mazeRouteMSMD is used for two pin nets as well - the name is just more general.

image

maliberty avatar Sep 27 '22 15:09 maliberty

@maliberty public- and secure-ci are green after my last commits. I had to revert a commit that changed the net sorting and was breaking the ci. I'll handle this in another PR.

eder-matheus avatar Oct 07 '22 19:10 eder-matheus

is src/grt/test/critical_nets_percentage.v needed anymore?

maliberty avatar Oct 08 '22 04:10 maliberty