Reporting ultimately takes longer than detailed routing for mock-array
Description
Using https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/1212
Create:
export DESIGN_CONFIG?=designs/asap7/mock-array/config.mk
export MOCK_ARRAY_DATAWIDTH?=8
export MOCK_ARRAY_TABLE?=8 8 4 4 5 5
export MOCK_ARRAY_SCALE?=640
Run make verilog and then make.
These are the running times reported for a 12 thread machine:
Still waiting for report after ca. 1 hour...
Log Elapsed seconds
1_1_yosys 1
2_1_floorplan 1
2_2_floorplan_io 1
2_3_tdms_place 1
2_5_tapcell 1
2_6_pdn 222
3_1_place_gp_skip_io 11
3_2_place_iop 3
3_3_place_gp 42
3_4_resizer 16
3_5_opendp 23
4_1_cts 27
4_2_cts_fillcell 25
5_1_fastroute 42
5_2_TritonRoute 1065
6_1_merge 47
6_report 5665
Tail of log:
[WARNING PSM-0030] VSRC location at (131.200um, 971.200um) and size 10.000um, is not located on an existing power stripe node. Moving to closest node at (129.882um, 965.404um).
[WARNING PSM-0030] VSRC location at (971.200um, 971.200um) and size 10.000um, is not located on an existing power stripe node. Moving to closest node at (967.818um, 965.404um).
[INFO PSM-0031] Number of PDN nodes on net VSS = 13251525.
[no further output after 1 hour... then some output before reporting completes relatively quickly.]
Suggested Solution
Find and fix some low-hanging fruit in the scaling of reporting stage.
Additional Context
No response
There are a number of different steps. Can you narrow it down?
There are a number of different steps. Can you narrow it down?
The log pause is not indicative enough? Anyone working on this will have to run this locally anyway at which point they can drill down...
Some quick debugger suspend/resume profiling.
This is the step that takes a long time.
(/usr/bin/time -f 'Elapsed time: %E[h:]min:sec. CPU time: user %U sys %S (%P). Peak memory: %MKB.' /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad -exit -no_init ./scripts/final_report.tcl -metrics ./logs/asap7/mock-array/base/6_report.json) 2>&1 | tee ./logs/asap7/mock-array/base/6_report.log
OpenROAD v2.0-9055-ge2044988a
This seems to be where the time is going. There are a LOT of these iterations and perhaps an N^2 algorithm?
Each iteration above, is inside another iteration:
@maliberty Is the above good enough to start working on the problem?
PSM is doing a matrix solve and that will not be linear in design area. The matrix size will be linear in design area. Smarter gridding would help some. You can skip that step if you don't care about it (which I guess you don't for prototyping).
@maliberty I see the following on my 96 thread workstation, using https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/1212
Log Elapsed seconds
1_1_yosys 2
2_1_floorplan 1
2_2_floorplan_io 1
2_3_tdms_place 1
2_4_mplace 1
2_5_tapcell 1
2_6_pdn 223
3_1_place_gp_skip_io 13
3_2_place_iop 5
3_3_place_gp 51
3_4_resizer 20
3_5_opendp 27
4_1_cts 32
4_2_cts_fillcell 28
5_1_fastroute 52
5_2_TritonRoute 535
6_1_merge 64
6_report 6224
@maliberty I think this issue documents a performance problem reasonably well, but I don't need it urgently fixed. Mark as "help wanted"?
Yes I see the psm has quite a bit of inefficiency in building the matrix, not just solving it. That can be improved.
@gadfort has offered to work on psm (he has similar issues and more concerns)
@oharboe would you be able to try out the changes in #4850 to see how they impact your design? You should only have to rerun the reporting stage (I would hope)
Before:
./logs/asap7/mock-array/base
Log Elapsed seconds
1_1_yosys 3
2_1_floorplan 1
2_2_floorplan_io 1
2_4_floorplan_macro 1
2_5_floorplan_tapcell 1
2_6_floorplan_pdn 90
3_1_place_gp_skip_io 3
3_2_place_iop 1
3_3_place_gp 5
3_4_place_resized 4
3_5_place_dp 4
4_1_cts 8
5_1_grt 11
5_2_fillcell 2
5_3_route 267
6_1_merge 2
6_report 232
Total 636
After:
$ make DESIGN_CONFIG=designs/asap7/mock-array/config.mk do-final
[INFO-FLOW] ASU ASAP7 - version 2
Default PVT selection: BC
[INFO][FLOW] Invoked hierarchical flow.
Block Element needs to be hardened.
mkdir -p ./logs/asap7/mock-array/base ./reports/asap7/mock-array/base
cp ./results/asap7/mock-array/base/5_route.odb ./results/asap7/mock-array/base/6_1_fill.odb
cp ./results/asap7/mock-array/base/5_route.sdc ./results/asap7/mock-array/base/6_1_fill.sdc
cp ./results/asap7/mock-array/base/5_route.sdc ./results/asap7/mock-array/base/6_final.sdc
Running final_report.tcl
[WARNING STA-0450] virtual clock clock_vir can not be propagated.
[INFO] Deleted 0 routing obstructions
[INFO RCX-0431] Defined process_corner X with ext_model_index 0
[INFO RCX-0029] Defined extraction corner X
[INFO RCX-0008] extracting parasitics of MockArray ...
[INFO RCX-0435] Reading extraction model file /home/oyvind/OpenROAD-flow-scripts/flow/platforms/asap7/rcx_patterns.rules ...
[INFO RCX-0436] RC segment generation MockArray (max_merge_res 50.0) ...
[INFO RCX-0040] Final 44262 rc segments
[INFO RCX-0439] Coupling Cap extraction MockArray ...
[INFO RCX-0440] Coupling threshhold is 0.1000 fF, coupling capacitance less than 0.1000 fF will be grounded.
[INFO RCX-0043] 105163 wires to be extracted
[INFO RCX-0442] 10% completion -- 10576 wires have been extracted
[INFO RCX-0442] 16% completion -- 17117 wires have been extracted
[INFO RCX-0442] 23% completion -- 24387 wires have been extracted
[INFO RCX-0442] 32% completion -- 34649 wires have been extracted
[INFO RCX-0442] 39% completion -- 41440 wires have been extracted
[INFO RCX-0442] 46% completion -- 48508 wires have been extracted
[INFO RCX-0442] 56% completion -- 58917 wires have been extracted
[INFO RCX-0442] 71% completion -- 75498 wires have been extracted
[INFO RCX-0442] 76% completion -- 80875 wires have been extracted
[INFO RCX-0442] 82% completion -- 87030 wires have been extracted
[INFO RCX-0442] 88% completion -- 92837 wires have been extracted
[INFO RCX-0442] 92% completion -- 97540 wires have been extracted
[INFO RCX-0442] 100% completion -- 105163 wires have been extracted
[INFO RCX-0045] Extract 23324 nets, 59969 rsegs, 59969 caps, 49108 ccs
[INFO RCX-0015] Finished extracting MockArray.
[INFO RCX-0016] Writing SPEF ...
[INFO RCX-0443] 23324 nets finished
[INFO RCX-0017] Finished writing SPEF ...
Signal 11 received
Stack trace:
0# 0x00005A77F41FE483 in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
1# 0x00007EB871A42990 in /lib/x86_64-linux-gnu/libc.so.6
2# odb::dbTechLayer::getRoutingLevel() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
3# psm::IRNetwork::generatePolygonsFromITerms(std::vector<psm::TerminalNode*, std::allocator<psm::TerminalNode*> >&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
4# psm::IRNetwork::generateRoutingLayerShapesAndNodes() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
5# psm::IRNetwork::construct() in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
6# psm::IRNetwork::IRNetwork(odb::dbNet*, utl::Logger*, bool) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
7# psm::IRSolver::IRSolver(odb::dbNet*, bool, sta::dbSta*, rsz::Resizer*, utl::Logger*, std::map<odb::dbNet*, std::map<sta::Corner*, float, std::less<sta::Corner*>, std::allocator<std::pair<sta::Corner* const, float> > >, std::less<odb::dbNet*>, std::allocator<std::pair<odb::dbNet* const, std::map<sta::Corner*, float, std::less<sta::Corner*>, std::allocator<std::pair<sta::Corner* const, float> > > > > > const&, psm::PDNSim::GeneratedSourceSettings const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
8# psm::PDNSim::getIRSolver(odb::dbNet*, bool) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
9# psm::PDNSim::checkConnectivity(odb::dbNet*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
10# psm::PDNSim::analyzePowerGrid(odb::dbNet*, sta::Corner*, psm::GeneratedSourceType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
11# analyze_power_grid_cmd(odb::dbNet*, sta::Corner*, psm::GeneratedSourceType, char const*, bool, char const*, char const*, char const*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
12# 0x00005A77F4EB1067 in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
13# TclNRRunCallbacks in /lib/x86_64-linux-gnu/libtcl8.6.so
14# 0x00007EB876385C43 in /lib/x86_64-linux-gnu/libtcl8.6.so
15# Tcl_EvalEx in /lib/x86_64-linux-gnu/libtcl8.6.so
16# Tcl_Eval in /lib/x86_64-linux-gnu/libtcl8.6.so
17# sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
18# ord::tclAppInit(Tcl_Interp*) in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
19# Tcl_MainEx in /lib/x86_64-linux-gnu/libtcl8.6.so
20# main in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
21# 0x00007EB871A28150 in /lib/x86_64-linux-gnu/libc.so.6
22# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
23# _start in /home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
Command terminated by signal 11
Elapsed time: 0:03.72[h:]min:sec. CPU time: user 3.52 sys 0.10 (97%). Peak memory: 325088KB.
make[1]: *** [Makefile:845: do-6_report] Error 139
make: *** [Makefile:859: do-finish] Error 2
@oharboe can you upload the testcase for this?
@oharboe can you upload the testcase for this?
Hmm... I swear I did this morning. No matter, here is a make final_report test-case:
@oharboe thanks. It was just the via in pin thing. I should have a fix for that in a bit. If you are wondering, it looks like the pdn analysis takes about 15 seconds per net. So this step takes about 38 seconds on my machine with 32 cores.
real 0m37.978s
user 0m34.706s
sys 0m0.803s
@gadfort 6_report 5x faster with #4850
Log Elapsed seconds
1_1_yosys 3
2_1_floorplan 1
2_2_floorplan_io 1
2_4_floorplan_macro 1
2_5_floorplan_tapcell 1
2_6_floorplan_pdn 90
3_1_place_gp_skip_io 3
3_2_place_iop 1
3_3_place_gp 5
3_4_place_resized 4
3_5_place_dp 4
4_1_cts 8
5_1_grt 11
5_2_fillcell 2
5_3_route 267
6_1_merge 2
6_report 232
Total 636
After:
Log Elapsed seconds
1_1_yosys 3
2_1_floorplan 1
2_2_floorplan_io 1
2_4_floorplan_macro 1
2_5_floorplan_tapcell 1
2_6_floorplan_pdn 90
3_1_place_gp_skip_io 3
3_2_place_iop 1
3_3_place_gp 5
3_4_place_resized 4
3_5_place_dp 4
4_1_cts 8
5_1_grt 11
5_2_fillcell 2
5_3_route 267
6_1_merge 2
6_report 43
Total 447
Creating a standalone test-case for the original problem in this issue.
untar https://drive.google.com/file/d/1PRU_nsiR0RuXkRl2A8EdG8ZQuhO0SoxX/view?usp=sharing
$ ./run-me-mock-array-asap7-base.sh
[deleted]
[INFO RCX-0015] Finished extracting MockArray.
[INFO RCX-0016] Writing SPEF ...
[INFO RCX-0443] 21324 nets finished
[INFO RCX-0017] Finished writing SPEF ...
[INFO PSM-0040] All shapes on net VDD are connected.
[INFO PSM-0073] Using bump pattern with x-pitch 140.0000um, y-pitch 140.0000um, and size 70.0000um with an reduction factor of 3x.
########## IR report #################
Corner : default
Supply voltage : 7.70e-01 V
Worstcase voltage: 7.62e-01 V
Average voltage : 7.69e-01 V
Average IR drop : 7.75e-04 V
Worstcase IR drop: 7.52e-03 V
Percentage drop : 0.98 %
######################################
[still running when I left my workstation]
What is the issue now?
What is the issue now?
Takes a long time to run. Didnt wait for it to complete.
@oharboe processing the VSS network takes a little longer. It's 732.3s for VDD vs. 1160s for VSS (I'm not sure that exactly accounts for the slowdown (assuming they are somewhat symmetric).
########## IR report #################
Corner : default
Supply voltage : 7.70e-01 V
Worstcase voltage: 7.65e-01 V
Average voltage : 7.69e-01 V
Average IR drop : 5.69e-04 V
Worstcase IR drop: 4.98e-03 V
Percentage drop : 0.65 %
######################################
########## IR report #################
Corner : default
Supply voltage : 0.00e+00 V
Worstcase voltage: 5.76e-03 V
Average voltage : 5.40e-04 V
Average IR drop : 5.40e-04 V
Worstcase IR drop: 5.76e-03 V
Percentage drop : 0.75 %
######################################
I see. Did you get some profile data to see where the time goes?
There is some timing profiling code in there to help. It's not as detailed as vtune or something, but good enough for rough checks.