OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

Reporting ultimately takes longer than detailed routing for mock-array

Open oharboe opened this issue 2 years ago • 9 comments

Description

Using https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/1212

Create:

export DESIGN_CONFIG?=designs/asap7/mock-array/config.mk
export MOCK_ARRAY_DATAWIDTH?=8
export MOCK_ARRAY_TABLE?=8 8 4 4 5 5
export MOCK_ARRAY_SCALE?=640

Run make verilog and then make.

These are the running times reported for a 12 thread machine:

Still waiting for report after ca. 1 hour...

Log                       Elapsed seconds
1_1_yosys                          1
2_1_floorplan                      1
2_2_floorplan_io                   1
2_3_tdms_place                     1
2_5_tapcell                        1
2_6_pdn                          222
3_1_place_gp_skip_io              11
3_2_place_iop                      3
3_3_place_gp                      42
3_4_resizer                       16
3_5_opendp                        23
4_1_cts                           27
4_2_cts_fillcell                  25
5_1_fastroute                     42
5_2_TritonRoute                 1065
6_1_merge                         47
6_report                        5665

Tail of log:

[WARNING PSM-0030] VSRC location at (131.200um, 971.200um) and size 10.000um, is not located on an existing power stripe node. Moving to closest node at (129.882um, 965.404um).
[WARNING PSM-0030] VSRC location at (971.200um, 971.200um) and size 10.000um, is not located on an existing power stripe node. Moving to closest node at (967.818um, 965.404um).
[INFO PSM-0031] Number of PDN nodes on net VSS = 13251525.
[no further output after 1 hour... then some output before reporting completes relatively quickly.]

Suggested Solution

Find and fix some low-hanging fruit in the scaling of reporting stage.

Additional Context

No response

oharboe avatar Jul 09 '23 08:07 oharboe

There are a number of different steps. Can you narrow it down?

maliberty avatar Jul 09 '23 14:07 maliberty

There are a number of different steps. Can you narrow it down?

The log pause is not indicative enough? Anyone working on this will have to run this locally anyway at which point they can drill down...

oharboe avatar Jul 09 '23 14:07 oharboe

Some quick debugger suspend/resume profiling.

This is the step that takes a long time.

(/usr/bin/time -f 'Elapsed time: %E[h:]min:sec. CPU time: user %U sys %S (%P). Peak memory: %MKB.' /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad -exit -no_init  ./scripts/final_report.tcl -metrics ./logs/asap7/mock-array/base/6_report.json) 2>&1 | tee ./logs/asap7/mock-array/base/6_report.log
OpenROAD v2.0-9055-ge2044988a 

This seems to be where the time is going. There are a LOT of these iterations and perhaps an N^2 algorithm?

image

Each iteration above, is inside another iteration:

image

oharboe avatar Jul 09 '23 15:07 oharboe

@maliberty Is the above good enough to start working on the problem?

oharboe avatar Jul 09 '23 15:07 oharboe

PSM is doing a matrix solve and that will not be linear in design area. The matrix size will be linear in design area. Smarter gridding would help some. You can skip that step if you don't care about it (which I guess you don't for prototyping).

maliberty avatar Jul 09 '23 20:07 maliberty

@maliberty I see the following on my 96 thread workstation, using https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/1212

Log                       Elapsed seconds
1_1_yosys                          2
2_1_floorplan                      1
2_2_floorplan_io                   1
2_3_tdms_place                     1
2_4_mplace                         1
2_5_tapcell                        1
2_6_pdn                          223
3_1_place_gp_skip_io              13
3_2_place_iop                      5
3_3_place_gp                      51
3_4_resizer                       20
3_5_opendp                        27
4_1_cts                           32
4_2_cts_fillcell                  28
5_1_fastroute                     52
5_2_TritonRoute                  535
6_1_merge                         64
6_report                        6224

oharboe avatar Jul 10 '23 16:07 oharboe

@maliberty I think this issue documents a performance problem reasonably well, but I don't need it urgently fixed. Mark as "help wanted"?

oharboe avatar Jul 11 '23 09:07 oharboe

Yes I see the psm has quite a bit of inefficiency in building the matrix, not just solving it. That can be improved.

maliberty avatar Jul 11 '23 14:07 maliberty

@gadfort has offered to work on psm (he has similar issues and more concerns)

maliberty avatar Jan 04 '24 22:01 maliberty

@oharboe would you be able to try out the changes in #4850 to see how they impact your design? You should only have to rerun the reporting stage (I would hope)

gadfort avatar Mar 26 '24 00:03 gadfort

Before:

./logs/asap7/mock-array/base
Log                       Elapsed seconds
1_1_yosys                          3
2_1_floorplan                      1
2_2_floorplan_io                   1
2_4_floorplan_macro                1
2_5_floorplan_tapcell              1
2_6_floorplan_pdn                 90
3_1_place_gp_skip_io               3
3_2_place_iop                      1
3_3_place_gp                       5
3_4_place_resized                  4
3_5_place_dp                       4
4_1_cts                            8
5_1_grt                           11
5_2_fillcell                       2
5_3_route                        267
6_1_merge                          2
6_report                         232
Total                            636

After:

$ make DESIGN_CONFIG=designs/asap7/mock-array/config.mk do-final
[INFO-FLOW] ASU ASAP7 - version 2
Default PVT selection: BC
[INFO][FLOW] Invoked hierarchical flow.
Block Element needs to be hardened.
mkdir -p ./logs/asap7/mock-array/base ./reports/asap7/mock-array/base
cp ./results/asap7/mock-array/base/5_route.odb ./results/asap7/mock-array/base/6_1_fill.odb
cp ./results/asap7/mock-array/base/5_route.sdc ./results/asap7/mock-array/base/6_1_fill.sdc
cp ./results/asap7/mock-array/base/5_route.sdc ./results/asap7/mock-array/base/6_final.sdc
Running final_report.tcl
[WARNING STA-0450] virtual clock clock_vir can not be propagated.
[INFO] Deleted 0 routing obstructions
[INFO RCX-0431] Defined process_corner X with ext_model_index 0
[INFO RCX-0029] Defined extraction corner X
[INFO RCX-0008] extracting parasitics of MockArray ...
[INFO RCX-0435] Reading extraction model file /home/oyvind/OpenROAD-flow-scripts/flow/platforms/asap7/rcx_patterns.rules ...
[INFO RCX-0436] RC segment generation MockArray (max_merge_res 50.0) ...
[INFO RCX-0040] Final 44262 rc segments
[INFO RCX-0439] Coupling Cap extraction MockArray ...
[INFO RCX-0440] Coupling threshhold is 0.1000 fF, coupling capacitance less than 0.1000 fF will be grounded.
[INFO RCX-0043] 105163 wires to be extracted
[INFO RCX-0442] 10% completion -- 10576 wires have been extracted
[INFO RCX-0442] 16% completion -- 17117 wires have been extracted
[INFO RCX-0442] 23% completion -- 24387 wires have been extracted
[INFO RCX-0442] 32% completion -- 34649 wires have been extracted
[INFO RCX-0442] 39% completion -- 41440 wires have been extracted
[INFO RCX-0442] 46% completion -- 48508 wires have been extracted
[INFO RCX-0442] 56% completion -- 58917 wires have been extracted
[INFO RCX-0442] 71% completion -- 75498 wires have been extracted
[INFO RCX-0442] 76% completion -- 80875 wires have been extracted
[INFO RCX-0442] 82% completion -- 87030 wires have been extracted
[INFO RCX-0442] 88% completion -- 92837 wires have been extracted
[INFO RCX-0442] 92% completion -- 97540 wires have been extracted
[INFO RCX-0442] 100% completion -- 105163 wires have been extracted
[INFO RCX-0045] Extract 23324 nets, 59969 rsegs, 59969 caps, 49108 ccs
[INFO RCX-0015] Finished extracting MockArray.
[INFO RCX-0016] Writing SPEF ...
[INFO RCX-0443] 23324 nets finished
[INFO RCX-0017] Finished writing SPEF ...
Signal 11 received
Stack trace:
 0# 0x00005A77F41FE483 in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 1# 0x00007EB871A42990 in /lib/x86_64-linux-gnu/libc.so.6
 2# odb::dbTechLayer::getRoutingLevel() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 3# psm::IRNetwork::generatePolygonsFromITerms(std::vector<psm::TerminalNode*, std::allocator<psm::TerminalNode*> >&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 4# psm::IRNetwork::generateRoutingLayerShapesAndNodes() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 5# psm::IRNetwork::construct() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 6# psm::IRNetwork::IRNetwork(odb::dbNet*, utl::Logger*, bool) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 7# psm::IRSolver::IRSolver(odb::dbNet*, bool, sta::dbSta*, rsz::Resizer*, utl::Logger*, std::map<odb::dbNet*, std::map<sta::Corner*, float, std::less<sta::Corner*>, std::allocator<std::pair<sta::Corner* const, float> > >, std::less<odb::dbNet*>, std::allocator<std::pair<odb::dbNet* const, std::map<sta::Corner*, float, std::less<sta::Corner*>, std::allocator<std::pair<sta::Corner* const, float> > > > > > const&, psm::PDNSim::GeneratedSourceSettings const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 8# psm::PDNSim::getIRSolver(odb::dbNet*, bool) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 9# psm::PDNSim::checkConnectivity(odb::dbNet*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
10# psm::PDNSim::analyzePowerGrid(odb::dbNet*, sta::Corner*, psm::GeneratedSourceType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
11# analyze_power_grid_cmd(odb::dbNet*, sta::Corner*, psm::GeneratedSourceType, char const*, bool, char const*, char const*, char const*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
12# 0x00005A77F4EB1067 in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
13# TclNRRunCallbacks in /lib/x86_64-linux-gnu/libtcl8.6.so
14# 0x00007EB876385C43 in /lib/x86_64-linux-gnu/libtcl8.6.so
15# Tcl_EvalEx in /lib/x86_64-linux-gnu/libtcl8.6.so
16# Tcl_Eval in /lib/x86_64-linux-gnu/libtcl8.6.so
17# sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
18# ord::tclAppInit(Tcl_Interp*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
19# Tcl_MainEx in /lib/x86_64-linux-gnu/libtcl8.6.so
20# main in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
21# 0x00007EB871A28150 in /lib/x86_64-linux-gnu/libc.so.6
22# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
23# _start in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
Command terminated by signal 11
Elapsed time: 0:03.72[h:]min:sec. CPU time: user 3.52 sys 0.10 (97%). Peak memory: 325088KB.
make[1]: *** [Makefile:845: do-6_report] Error 139
make: *** [Makefile:859: do-finish] Error 2

oharboe avatar Mar 26 '24 06:03 oharboe

@oharboe can you upload the testcase for this?

gadfort avatar Mar 26 '24 12:03 gadfort

@oharboe can you upload the testcase for this?

Hmm... I swear I did this morning. No matter, here is a make final_report test-case:

final-report-crash.tar.gz

oharboe avatar Mar 26 '24 12:03 oharboe

@oharboe thanks. It was just the via in pin thing. I should have a fix for that in a bit. If you are wondering, it looks like the pdn analysis takes about 15 seconds per net. So this step takes about 38 seconds on my machine with 32 cores.

real	0m37.978s
user	0m34.706s
sys	0m0.803s

gadfort avatar Mar 26 '24 13:03 gadfort

@gadfort 6_report 5x faster with #4850

Log                       Elapsed seconds
1_1_yosys                          3
2_1_floorplan                      1
2_2_floorplan_io                   1
2_4_floorplan_macro                1
2_5_floorplan_tapcell              1
2_6_floorplan_pdn                 90
3_1_place_gp_skip_io               3
3_2_place_iop                      1
3_3_place_gp                       5
3_4_place_resized                  4
3_5_place_dp                       4
4_1_cts                            8
5_1_grt                           11
5_2_fillcell                       2
5_3_route                        267
6_1_merge                          2
6_report                         232
Total                            636

After:

Log                       Elapsed seconds
1_1_yosys                          3
2_1_floorplan                      1
2_2_floorplan_io                   1
2_4_floorplan_macro                1
2_5_floorplan_tapcell              1
2_6_floorplan_pdn                 90
3_1_place_gp_skip_io               3
3_2_place_iop                      1
3_3_place_gp                       5
3_4_place_resized                  4
3_5_place_dp                       4
4_1_cts                            8
5_1_grt                           11
5_2_fillcell                       2
5_3_route                        267
6_1_merge                          2
6_report                          43
Total                            447

oharboe avatar Mar 26 '24 13:03 oharboe

Creating a standalone test-case for the original problem in this issue.

untar https://drive.google.com/file/d/1PRU_nsiR0RuXkRl2A8EdG8ZQuhO0SoxX/view?usp=sharing

$ ./run-me-mock-array-asap7-base.sh
[deleted]
[INFO RCX-0015] Finished extracting MockArray.
[INFO RCX-0016] Writing SPEF ...
[INFO RCX-0443] 21324 nets finished
[INFO RCX-0017] Finished writing SPEF ...
[INFO PSM-0040] All shapes on net VDD are connected.
[INFO PSM-0073] Using bump pattern with x-pitch 140.0000um, y-pitch 140.0000um, and size 70.0000um with an reduction factor of 3x.
########## IR report #################
Corner           : default
Supply voltage   : 7.70e-01 V
Worstcase voltage: 7.62e-01 V
Average voltage  : 7.69e-01 V
Average IR drop  : 7.75e-04 V
Worstcase IR drop: 7.52e-03 V
Percentage drop  : 0.98 %
######################################
[still running when I left my workstation]

oharboe avatar Mar 26 '24 14:03 oharboe

What is the issue now?

gadfort avatar Mar 26 '24 15:03 gadfort

What is the issue now?

Takes a long time to run. Didnt wait for it to complete.

oharboe avatar Mar 26 '24 15:03 oharboe

@oharboe processing the VSS network takes a little longer. It's 732.3s for VDD vs. 1160s for VSS (I'm not sure that exactly accounts for the slowdown (assuming they are somewhat symmetric).

########## IR report #################
Corner           : default
Supply voltage   : 7.70e-01 V
Worstcase voltage: 7.65e-01 V
Average voltage  : 7.69e-01 V
Average IR drop  : 5.69e-04 V
Worstcase IR drop: 4.98e-03 V
Percentage drop  : 0.65 %
######################################

########## IR report #################
Corner           : default
Supply voltage   : 0.00e+00 V
Worstcase voltage: 5.76e-03 V
Average voltage  : 5.40e-04 V
Average IR drop  : 5.40e-04 V
Worstcase IR drop: 5.76e-03 V
Percentage drop  : 0.75 %
######################################

gadfort avatar Mar 26 '24 16:03 gadfort

I see. Did you get some profile data to see where the time goes?

oharboe avatar Mar 26 '24 16:03 oharboe

There is some timing profiling code in there to help. It's not as detailed as vtune or something, but good enough for rough checks.

gadfort avatar Mar 26 '24 17:03 gadfort