OpenROAD
OpenROAD copied to clipboard
grt: use slack values to sort nets during congestion iterations
I'll keep this as a draft to discuss the methodology of the new sorting. Also, will run the designs on the CI to check for improvements.
This methodology is ok, the proof is in the results.
I tried this on one of my designs that has global routing issues. I added some debug where we call getNetSlack()
, and it only returns 0 or sta::MinMax::max()
. Is there some STA initialisation we are missing?
I tried this on one of my designs that has global routing issues. I added some debug where we call
getNetSlack()
, and it only returns 0 orsta::MinMax::max()
. Is there some STA initialisation we are missing?
To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl
To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl
Thanks @eder-matheus, I'm seeing reasonable values now.
To make it properly work, I had to read the SDC file, set the wire rc for both signal and clock nets, and estimate the parasitics based on placement. You can check this script for a sky130hs example: https://github.com/eder-matheus/OpenROAD/blob/grt_slacks/src/grt/test/critical_nets_percentage.tcl
Thanks @eder-matheus, I'm seeing reasonable values now.
Great to hear that! I'm working on some experiments before merging it, but I believe it will be on the master branch tomorrow.
While this improves some of my designs, it does nothing on others. A few things I noticed:
- If I read it right, this only applies to multi source/multi dest nets (
mazeRouteMSMD
). A lot of my problems are just 2 pin nets. - Some designs never call into
StNetOrder
. Does that mean there is no congestion? Do routes closely match the Steiner tree in this case? - Shouldn't we sort the nets by slack at the start of global routing? We'd want to prioritise these critical nets in all stages of global routing (eg layer assignment), not just when handling congestion.
- We should also prioritise clock nets. I went looking for how grt handles clock nets and commit dde95bea4da5519e9c11ba5f4c4aaf8d6c2f5fab seems to undo the previous code which puts non leaf clock nodes at the start of the net list:
for (odb::dbNet* db_net : block_->getNets()) {
Net* net = addNet(db_net);
// add clock nets not connected to a leaf first
if (net) {
bool is_non_leaf_clock = isNonLeafClock(net->getDbNet());
if (is_non_leaf_clock)
nets.push_back(net);
}
}
for (auto net_itr : db_net_map_) {
Net* net = net_itr.second;
bool is_non_leaf_clock = isNonLeafClock(net->getDbNet());
if (!is_non_leaf_clock) {
nets.push_back(net);
}
}
std::sort(nets.begin(), nets.end(), nameLess); <--- here
return nets;
We could sort by clock status then by name. The by name was to make the results more stable across any db reordering.
I think mazeRouteMSMD is used for two pin nets as well - the name is just more general.
Layer assignment is done at the very end after the 2d routing is over. It is a different task to make that timing aware.
I'm less clear that handling timing early is important
- The criticality may be off (see our general gpl vs grt parasitic discrepancy)
- There isn't much room in pattern routing to do anything to improve timing.
I've updated the net sorting to prefer clock status over the net names. @antonblanchard could you share some of your testcases that don't show improvements?
I ran a few tests with this series (which I forward ported to current master). As expected, it does help when placement density is high. On designs with low placement density it wont make a difference, because there is no pressure on global routing.
The critical path in this test case improves by 30ps:
32bit_4cycle_asap7_multiplier-2.tar.gz
baseline:
==========================================================================
finish report_checks -unconstrained
--------------------------------------------------------------------------
Startpoint: _2194_ (falling edge-triggered flip-flop clocked by clk')
Endpoint: _2274_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock clk' (fall edge)
0.00 0.00 clock source latency
0.00 0.00 0.00 ^ clk (in)
1 5.63 clk (net)
4.72 1.49 1.49 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
8.55 16.27 17.75 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
2 4.37 clknet_0_clk (net)
8.66 0.51 18.26 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
25.74 24.76 43.02 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
15 26.78 clknet_1_0__leaf_clk (net)
26.74 2.75 45.77 ^ clkbuf_opt_1_0_clk/A (BUFx8_ASAP7_75t_R)
8.46 22.36 68.13 ^ clkbuf_opt_1_0_clk/Y (BUFx8_ASAP7_75t_R)
1 3.54 clknet_opt_1_0_clk (net)
8.99 1.03 69.16 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
25.76 22.55 91.71 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
30 26.60 clknet_leaf_7_clk (net)
27.93 3.96 95.67 ^ net352_162/A (INVx1_ASAP7_75t_R)
42.18 25.00 120.67 v net352_162/Y (INVx1_ASAP7_75t_R)
1 7.15 net366 (net)
43.14 3.51 124.18 v _2194_/CLK (DFFLQNx3_ASAP7_75t_R)
28.64 65.22 189.40 v _2194_/QN (DFFLQNx3_ASAP7_75t_R)
1 5.61 _0161_ (net)
29.80 3.05 192.45 v _0949_/A (INVx1_ASAP7_75t_R)
38.62 24.58 217.03 ^ _0949_/Y (INVx1_ASAP7_75t_R)
1 5.46 pp_row45_0 (net)
39.01 2.17 219.20 ^ dadda_fa_1_45_0/A (FAx1_ASAP7_75t_R)
105.32 91.20 310.40 v dadda_fa_1_45_0/SN (FAx1_ASAP7_75t_R)
1 4.53 sn$394 (net)
105.40 1.67 312.06 v U$$1537/A (INVx1_ASAP7_75t_R)
41.45 32.51 344.57 ^ U$$1537/Y (INVx1_ASAP7_75t_R)
1 3.25 s$849 (net)
41.49 0.70 345.27 ^ dadda_fa_2_45_2/A (FAx1_ASAP7_75t_R)
36.37 39.71 384.98 ^ dadda_fa_2_45_2/SN (FAx1_ASAP7_75t_R)
1 1.11 sn$848 (net)
36.38 0.17 385.15 ^ U$$1763/A (INVx1_ASAP7_75t_R)
23.76 17.83 402.98 v U$$1763/Y (INVx1_ASAP7_75t_R)
1 3.33 s$1283 (net)
23.88 0.96 403.94 v dadda_fa_3_45_1/CI (FAx1_ASAP7_75t_R)
31.77 21.88 425.82 ^ dadda_fa_3_45_1/CON (FAx1_ASAP7_75t_R)
1 0.98 con$1279 (net)
31.77 0.09 425.91 ^ U$$1972/A (INVx1_ASAP7_75t_R)
17.20 13.44 439.34 v U$$1972/Y (INVx1_ASAP7_75t_R)
1 2.29 c$1598 (net)
17.20 0.13 439.48 v dadda_fa_4_46_0/B (FAx1_ASAP7_75t_R)
29.87 21.70 461.18 ^ dadda_fa_4_46_0/CON (FAx1_ASAP7_75t_R)
1 1.02 con$1595 (net)
21.99 14.21 475.39 v dadda_fa_4_46_0/SN (FAx1_ASAP7_75t_R)
1 0.89 sn$1596 (net)
21.99 0.08 475.47 v U$$2115/A (INVx1_ASAP7_75t_R)
14.28 11.79 487.26 ^ U$$2115/Y (INVx1_ASAP7_75t_R)
1 1.74 s$1886 (net)
14.28 0.14 487.40 ^ dadda_fa_5_46_0/CI (FAx1_ASAP7_75t_R)
35.91 36.54 523.93 ^ dadda_fa_5_46_0/SN (FAx1_ASAP7_75t_R)
1 1.24 sn$1883 (net)
35.91 0.15 524.08 ^ U$$2237/A (INVx1_ASAP7_75t_R)
13.74 10.97 535.05 v U$$2237/Y (INVx1_ASAP7_75t_R)
1 1.23 s$2684 (net)
13.75 0.18 535.22 v _2274_/D (DFFLQNx1_ASAP7_75t_R)
535.22 data arrival time
330.00 330.00 clock clk' (fall edge)
0.00 330.00 clock source latency
0.00 0.00 330.00 ^ clk (in)
1 5.42 clk (net)
4.42 1.39 331.39 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
8.55 16.17 347.56 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
2 3.95 clknet_0_clk (net)
8.64 0.44 348.01 ^ clkbuf_1_1__f_clk/A (BUFx8_ASAP7_75t_R)
22.29 22.76 370.77 ^ clkbuf_1_1__f_clk/Y (BUFx8_ASAP7_75t_R)
12 18.73 clknet_1_1__leaf_clk (net)
23.27 2.55 373.31 ^ clkbuf_leaf_11_clk/A (BUFx8_ASAP7_75t_R)
22.83 27.86 401.17 ^ clkbuf_leaf_11_clk/Y (BUFx8_ASAP7_75t_R)
30 19.75 clknet_leaf_11_clk (net)
23.72 2.43 403.60 ^ net252_82/A (INVx1_ASAP7_75t_R)
7.87 6.84 410.44 v net252_82/Y (INVx1_ASAP7_75t_R)
1 0.48 net286 (net)
7.87 0.04 410.47 v _2274_/CLK (DFFLQNx1_ASAP7_75t_R)
0.19 410.67 clock reconvergence pessimism
-8.60 402.06 library setup time
402.06 data required time
-----------------------------------------------------------------------------
402.06 data required time
-535.22 data arrival time
-----------------------------------------------------------------------------
-133.16 slack (VIOLATED)
patched:
finish report_checks -unconstrained
--------------------------------------------------------------------------
Startpoint: _2185_ (falling edge-triggered flip-flop clocked by clk')
Endpoint: _2273_ (falling edge-triggered flip-flop clocked by clk')
Path Group: clk
Path Type: max
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock clk' (fall edge)
0.00 0.00 clock source latency
0.00 0.00 0.00 ^ clk (in)
1 5.55 clk (net)
4.87 1.54 1.54 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
8.54 16.31 17.84 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
2 4.36 clknet_0_clk (net)
8.67 0.54 18.39 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
25.16 24.65 43.04 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
15 26.22 clknet_1_0__leaf_clk (net)
25.75 2.13 45.17 ^ clkbuf_opt_1_0_clk/A (BUFx8_ASAP7_75t_R)
8.01 21.94 67.11 ^ clkbuf_opt_1_0_clk/Y (BUFx8_ASAP7_75t_R)
1 3.02 clknet_opt_1_0_clk (net)
8.19 0.66 67.77 ^ clkbuf_leaf_7_clk/A (BUFx8_ASAP7_75t_R)
26.17 21.67 89.44 ^ clkbuf_leaf_7_clk/Y (BUFx8_ASAP7_75t_R)
30 26.81 clknet_leaf_7_clk (net)
34.07 7.43 96.86 ^ net352_171/A (INVx1_ASAP7_75t_R)
25.94 18.86 115.72 v net352_171/Y (INVx1_ASAP7_75t_R)
1 3.67 net375 (net)
26.05 0.99 116.71 v _2185_/CLK (DFFLQNx3_ASAP7_75t_R)
18.68 54.25 170.96 v _2185_/QN (DFFLQNx3_ASAP7_75t_R)
1 1.81 _0170_ (net)
18.70 0.32 171.28 v _0958_/A (INVx1_ASAP7_75t_R)
22.49 15.67 186.95 ^ _0958_/Y (INVx1_ASAP7_75t_R)
1 3.40 pp_row44_4 (net)
22.59 0.84 187.79 ^ dadda_fa_1_44_1/B (FAx1_ASAP7_75t_R)
145.57 106.67 294.46 v dadda_fa_1_44_1/SN (FAx1_ASAP7_75t_R)
1 6.24 sn$388 (net)
145.83 3.39 297.85 v U$$1531/A (INVx1_ASAP7_75t_R)
44.29 34.63 332.47 ^ U$$1531/Y (INVx1_ASAP7_75t_R)
1 2.77 s$836 (net)
44.31 0.45 332.92 ^ dadda_fa_2_44_2/A (FAx1_ASAP7_75t_R)
27.94 22.89 355.81 v dadda_fa_2_44_2/CON (FAx1_ASAP7_75t_R)
1 1.34 con$834 (net)
32.54 16.17 371.98 ^ dadda_fa_2_44_2/SN (FAx1_ASAP7_75t_R)
1 0.86 sn$835 (net)
32.54 0.08 372.05 ^ U$$1757/A (INVx1_ASAP7_75t_R)
16.21 12.74 384.80 v U$$1757/Y (INVx1_ASAP7_75t_R)
1 2.02 s$1273 (net)
16.21 0.22 385.02 v dadda_fa_3_44_1/CI (FAx1_ASAP7_75t_R)
28.01 18.20 403.22 ^ dadda_fa_3_44_1/CON (FAx1_ASAP7_75t_R)
1 0.77 con$1269 (net)
28.01 0.04 403.25 ^ U$$1968/A (INVx1_ASAP7_75t_R)
16.62 12.88 416.13 v U$$1968/Y (INVx1_ASAP7_75t_R)
1 2.33 c$1593 (net)
16.63 0.22 416.36 v dadda_fa_4_45_0/B (FAx1_ASAP7_75t_R)
31.82 22.68 439.03 ^ dadda_fa_4_45_0/CON (FAx1_ASAP7_75t_R)
1 1.20 con$1590 (net)
20.85 14.07 453.10 v dadda_fa_4_45_0/SN (FAx1_ASAP7_75t_R)
1 0.80 sn$1591 (net)
20.85 0.07 453.17 v U$$2113/A (INVx1_ASAP7_75t_R)
13.63 11.34 464.51 ^ U$$2113/Y (INVx1_ASAP7_75t_R)
1 1.68 s$1881 (net)
13.64 0.08 464.59 ^ dadda_fa_5_45_0/CI (FAx1_ASAP7_75t_R)
36.86 36.82 501.42 ^ dadda_fa_5_45_0/SN (FAx1_ASAP7_75t_R)
1 1.28 sn$1878 (net)
36.86 0.18 501.60 ^ U$$2235/A (INVx1_ASAP7_75t_R)
10.61 8.56 510.15 v U$$2235/Y (INVx1_ASAP7_75t_R)
1 0.70 s$2682 (net)
10.61 0.05 510.20 v _2273_/D (DFFLQNx2_ASAP7_75t_R)
510.20 data arrival time
330.00 330.00 clock clk' (fall edge)
0.00 330.00 clock source latency
0.00 0.00 330.00 ^ clk (in)
1 5.34 clk (net)
4.54 1.43 331.43 ^ clkbuf_0_clk/A (BUFx8_ASAP7_75t_R)
8.54 16.21 347.64 ^ clkbuf_0_clk/Y (BUFx8_ASAP7_75t_R)
2 3.94 clknet_0_clk (net)
8.67 0.54 348.18 ^ clkbuf_1_0__f_clk/A (BUFx8_ASAP7_75t_R)
25.16 24.65 372.83 ^ clkbuf_1_0__f_clk/Y (BUFx8_ASAP7_75t_R)
15 23.11 clknet_1_0__leaf_clk (net)
26.21 2.76 375.59 ^ clkbuf_leaf_9_clk/A (BUFx8_ASAP7_75t_R)
24.31 28.48 404.07 ^ clkbuf_leaf_9_clk/Y (BUFx8_ASAP7_75t_R)
30 20.73 clknet_leaf_9_clk (net)
26.61 3.95 408.02 ^ net252_83/A (INVx1_ASAP7_75t_R)
8.23 7.02 415.04 v net252_83/Y (INVx1_ASAP7_75t_R)
1 0.47 net287 (net)
8.23 0.04 415.07 v _2273_/CLK (DFFLQNx2_ASAP7_75t_R)
0.21 415.28 clock reconvergence pessimism
-8.37 406.91 library setup time
406.91 data required time
-----------------------------------------------------------------------------
406.91 data required time
-510.20 data arrival time
-----------------------------------------------------------------------------
-103.29 slack (VIOLATED)
The only question that remains for me is if -critical_nets_percentage
should be changed to take a percentage (as it suggests). The option right now wants a 0-1 value.
There is an existing option to compare to: -capacities_perturbation_percentage 50
The only question that remains for me is if
-critical_nets_percentage
should be changed to take a percentage (as it suggests). The option right now wants a 0-1 value.There is an existing option to compare to:
-capacities_perturbation_percentage 50
Thanks a lot for your feedback, @antonblanchard! Your results seem really good. I've updated the branch with the latest master branch and also updated the option to take a percentage, as you suggested (which makes more sense).
I'll also start a secure-ci run to make sure it doesn't break anything.
I think mazeRouteMSMD is used for two pin nets as well - the name is just more general.
@maliberty public- and secure-ci are green after my last commits. I had to revert a commit that changed the net sorting and was breaking the ci. I'll handle this in another PR.
is src/grt/test/critical_nets_percentage.v needed anymore?