Repairing 1 endpoint in MegaBoom takes forever(40000 seconds) and sometimes crashes
Describe the bug
untar and run https://drive.google.com/file/d/1U_G1_G6m0huMM-uTx5yhC-nW28OyD7To/view?usp=sharing
This CTS used to run in a reasonable amount of time, now it crashes with a malloc error after ca. 40000s
Full log
iterations against WNS in repair:
Expected Behavior
Finish in a reasonable amount of time and not crash
Environment
OpenROAD v2.0-21120-gc1c1e43ccf
To Reproduce
see above
Relevant log output
Screenshots
No response
Additional Context
No response
Is it cts or just the cts step of the flow which contains other commands. A log snippet would help clarify.
Is it cts or just the cts step of the flow which contains other commands. A log snippet would help clarify.
I am collecting data, but I suspect it will take days. If I cant make this actionable, I will close.
So repair_timing is taking a 27000 seconds... This means that you can probably get into the slow part quickly. It would be interesting to know why repair_timing is taking so long.
final | 0 | 182 | 0 | +0.0% | -199.845 | -251912.531 | frontend.bpd.banked_predictors_1.tage.t.table_ext/W0_data[24]
--------------------------------------------------------------------------------------
[INFO RSZ-0032] Inserted 182 hold buffers.
Took 26619 seconds: repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
Placement Analysis
---------------------------------
total displacement 480.8 u
average displacement 0.0 u
max displacement 18.9 u
original HPWL 13223460.3 u
legalized HPWL 13223653.8 u
delta HPWL 0 %
real 449m8,014s
user 448m43,595s
sys 0m16,404s
Is this amount of tns seen before the recent slow down? How many iterations did it take?
Is this amount of tns seen before the recent slow down? How many iterations did it take?
Don't have A/B information, still collecting information, it will be a few days. Waiting on a second sighting of that crash.
Crash log, not actionable yet...
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[INFO ORD-0030] Using 48 thread(s).
source external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/platforms/asap7/setRC.tcl
clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO CTS-0050] Root buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0051] Sink buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0052] The following clock buffers will be used for CTS:
BUFx24_ASAP7_75t_R
[INFO CTS-0049] Characterization buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0007] Net "clock" found for clock "clock".
[INFO CTS-0011] Clock net "clock" for macros has 151 sinks.
[INFO CTS-0011] Clock net "clock_regs" for registers has 118443 sinks.
[INFO CTS-0008] TritonCTS found 2 clock nets.
[INFO CTS-0097] Characterization used 1 buffer(s) types.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock.
[INFO CTS-0028] Total number of sinks: 151.
[INFO CTS-0090] Sinks will be clustered based on buffer max cap.
[INFO CTS-0030] Number of static layers: 0.
[INFO CTS-0020] Wire segment unit: 1350 dbu (1 um).
[INFO CTS-0023] Original sink region: [(68070, 45180), (1216583, 1227852)].
[INFO CTS-0024] Normalized sink region: [(50.4222, 33.4667), (901.173, 909.52)].
[INFO CTS-0025] Width: 850.7504.
[INFO CTS-0026] Height: 876.0533.
Level 1
Direction: Vertical
Sinks per sub-region: 76
Sub-region size: 850.7504 X 438.0267
[INFO CTS-0034] Segment length (rounded): 220.
Level 2
Direction: Horizontal
Sinks per sub-region: 38
Sub-region size: 425.3752 X 438.0267
[INFO CTS-0034] Segment length (rounded): 212.
Level 3
Direction: Vertical
Sinks per sub-region: 19
Sub-region size: 425.3752 X 219.0133
[INFO CTS-0034] Segment length (rounded): 110.
Level 4
Direction: Horizontal
Sinks per sub-region: 10
Sub-region size: 212.6876 X 219.0133
[INFO CTS-0034] Segment length (rounded): 106.
[INFO CTS-0032] Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035] Number of sinks covered: 151.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock_regs.
[INFO CTS-0028] Total number of sinks: 118443.
[INFO CTS-0090] Sinks will be clustered based on buffer max cap.
[INFO CTS-0030] Number of static layers: 0.
[INFO CTS-0020] Wire segment unit: 1350 dbu (1 um).
[INFO CTS-0206] Best clustering solution was found from clustering size of 10 and clustering diameter of 50.
[INFO CTS-0019] Total number of sinks after clustering: 5658.
[INFO CTS-0024] Normalized sink region: [(61.5519, 10.2067), (852.212, 772.82)].
[INFO CTS-0025] Width: 790.6600.
[INFO CTS-0026] Height: 762.6133.
Level 1
Direction: Horizontal
Sinks per sub-region: 2829
Sub-region size: 395.3300 X 762.6133
[INFO CTS-0034] Segment length (rounded): 198.
Level 2
Direction: Vertical
Sinks per sub-region: 1415
Sub-region size: 395.3300 X 381.3067
[INFO CTS-0034] Segment length (rounded): 190.
Level 3
Direction: Horizontal
Sinks per sub-region: 708
Sub-region size: 197.6650 X 381.3067
[INFO CTS-0034] Segment length (rounded): 98.
Level 4
Direction: Vertical
Sinks per sub-region: 354
Sub-region size: 197.6650 X 190.6533
[INFO CTS-0034] Segment length (rounded): 96.
Level 5
Direction: Horizontal
Sinks per sub-region: 177
Sub-region size: 98.8325 X 190.6533
[INFO CTS-0034] Segment length (rounded): 50.
Level 6
Direction: Vertical
Sinks per sub-region: 89
Sub-region size: 98.8325 X 95.3267
[INFO CTS-0034] Segment length (rounded): 48.
Level 7
Direction: Horizontal
Sinks per sub-region: 45
Sub-region size: 49.4162 X 95.3267
[INFO CTS-0034] Segment length (rounded): 24.
Level 8
Direction: Vertical
Sinks per sub-region: 23
Sub-region size: 49.4162 X 47.6633
[INFO CTS-0034] Segment length (rounded): 24.
Level 9
Direction: Horizontal
Sinks per sub-region: 12
Sub-region size: 24.7081 X 47.6633
[INFO CTS-0034] Segment length (rounded): 12.
[INFO CTS-0032] Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035] Number of sinks covered: 5658.
[INFO CTS-0018] Created 119 clock buffers.
[INFO CTS-0012] Minimum number of buffers in the clock path: 21.
[INFO CTS-0013] Maximum number of buffers in the clock path: 21.
[INFO CTS-0015] Created 119 clock nets.
[INFO CTS-0016] Fanout distribution for the current clock = 4:2, 5:1, 6:2, 7:1, 9:1, 10:4, 11:1, 14:1, 15:3..
[INFO CTS-0017] Max level of the clock tree: 4.
[INFO CTS-0018] Created 6791 clock buffers.
[INFO CTS-0012] Minimum number of buffers in the clock path: 26.
[INFO CTS-0013] Maximum number of buffers in the clock path: 26.
[INFO CTS-0015] Created 6791 clock nets.
[INFO CTS-0016] Fanout distribution for the current clock = 1:3, 2:12, 3:29, 4:40, 5:36, 6:41, 7:34, 8:50, 9:62, 10:53, 11:62, 12:76, 13:101, 14:110, 15:132, 16:195, 17:257, 18:351, 19:434, 20:601, 21:686, 22:701, 23:698, 24:562, 25:433, 26:227, 27:116, 28:38, 29:22, 30:5, 31:2, 32:1..
[INFO CTS-0017] Max level of the clock tree: 9.
[INFO CTS-0098] Clock net "clock"
[INFO CTS-0099] Sinks 151
[INFO CTS-0100] Leaf buffers 0
[INFO CTS-0101] Average sink wire length 2164.53 um
[INFO CTS-0102] Path depth 21 - 21
[INFO CTS-0207] Leaf load cells 5322
[INFO CTS-0098] Clock net "clock_regs"
[INFO CTS-0099] Sinks 123765
[INFO CTS-0100] Leaf buffers 5658
[INFO CTS-0101] Average sink wire length 990.96 um
[INFO CTS-0102] Path depth 25 - 26
[INFO CTS-0207] Leaf load cells 5322
Took 22 seconds: clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO RSZ-0058] Using max wire length 162um.
[INFO RSZ-0047] Found 28 long wires.
[INFO RSZ-0048] Inserted 93 buffers in 28 nets.
Placement Analysis
---------------------------------
total displacement 34069.3 u
average displacement 0.0 u
max displacement 183.2 u
original HPWL 13067750.8 u
legalized HPWL 13221595.6 u
delta HPWL 1 %
repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
malloc(): corrupted top size
Signal 6 received
Stack trace:
Unsure how to proceed with debugging, because it crashes only in the bazel-orfs context. What is stranger is that bazel doesn't stop upon signal 6 from the underlying process, bazel is still waiting for the job to complete, but no CPU is used.
bazel doesn't have some evil power to inject malloc(): corrupted top size, but that doesn't tell me how to make it reproducible and actionable...
This is a successful run from docker image v3.0-2863-g22e02291
It took ca. 45 minutes. Of course, the input to CTS will be different. I think it is worth having a look at why the test case is taking a long time.
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[INFO ORD-0030] Using 44 thread(s).
mkdir -p bazel-out/k8-fastbuild/bin/objects/asap7/BoomTile/base
external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/scripts/flow.sh 4_1_cts cts
Running cts.tcl, stage 4_1_cts
source external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/platforms/asap7/setRC.tcl
clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO CTS-0050] Root buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0051] Sink buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0052] The following clock buffers will be used for CTS:
BUFx24_ASAP7_75t_R
[INFO CTS-0049] Characterization buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0007] Net "clock" found for clock "clock".
[INFO CTS-0011] Clock net "clock" for macros has 151 sinks.
[INFO CTS-0011] Clock net "clock_regs" for registers has 118443 sinks.
[INFO CTS-0008] TritonCTS found 2 clock nets.
[INFO CTS-0097] Characterization used 1 buffer(s) types.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock.
[INFO CTS-0028] Total number of sinks: 151.
[INFO CTS-0090] Sinks will be clustered based on buffer max cap.
[INFO CTS-0030] Number of static layers: 0.
[INFO CTS-0020] Wire segment unit: 1350 dbu (1 um).
[INFO CTS-0023] Original sink region: [(33463, 45084), (1216583, 1217676)].
[INFO CTS-0024] Normalized sink region: [(24.7874, 33.3956), (901.173, 901.982)].
[INFO CTS-0025] Width: 876.3852.
[INFO CTS-0026] Height: 868.5867.
Level 1
Direction: Horizontal
Sinks per sub-region: 76
Sub-region size: 438.1926 X 868.5867
[INFO CTS-0034] Segment length (rounded): 220.
Level 2
Direction: Vertical
Sinks per sub-region: 38
Sub-region size: 438.1926 X 434.2933
[INFO CTS-0034] Segment length (rounded): 218.
Level 3
Direction: Horizontal
Sinks per sub-region: 19
Sub-region size: 219.0963 X 434.2933
[INFO CTS-0034] Segment length (rounded): 110.
Level 4
Direction: Vertical
Sinks per sub-region: 10
Sub-region size: 219.0963 X 217.1467
[INFO CTS-0034] Segment length (rounded): 108.
[INFO CTS-0032] Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035] Number of sinks covered: 151.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock_regs.
[INFO CTS-0028] Total number of sinks: 118443.
[INFO CTS-0090] Sinks will be clustered based on buffer max cap.
[INFO CTS-0030] Number of static layers: 0.
[INFO CTS-0020] Wire segment unit: 1350 dbu (1 um).
[INFO CTS-0206] Best clustering solution was found from clustering size of 20 and clustering diameter of 50.
[INFO CTS-0019] Total number of sinks after clustering: 5733.
[INFO CTS-0024] Normalized sink region: [(61.4185, 10.4667), (870.637, 837.767)].
[INFO CTS-0025] Width: 809.2189.
[INFO CTS-0026] Height: 827.3000.
Level 1
Direction: Vertical
Sinks per sub-region: 2867
Sub-region size: 809.2189 X 413.6500
[INFO CTS-0034] Segment length (rounded): 206.
Level 2
Direction: Horizontal
Sinks per sub-region: 1434
Sub-region size: 404.6095 X 413.6500
[INFO CTS-0034] Segment length (rounded): 202.
Level 3
Direction: Vertical
Sinks per sub-region: 717
Sub-region size: 404.6095 X 206.8250
[INFO CTS-0034] Segment length (rounded): 104.
Level 4
Direction: Horizontal
Sinks per sub-region: 359
Sub-region size: 202.3047 X 206.8250
[INFO CTS-0034] Segment length (rounded): 102.
Level 5
Direction: Vertical
Sinks per sub-region: 180
Sub-region size: 202.3047 X 103.4125
[INFO CTS-0034] Segment length (rounded): 52.
Level 6
Direction: Horizontal
Sinks per sub-region: 90
Sub-region size: 101.1524 X 103.4125
[INFO CTS-0034] Segment length (rounded): 50.
Level 7
Direction: Vertical
Sinks per sub-region: 45
Sub-region size: 101.1524 X 51.7062
[INFO CTS-0034] Segment length (rounded): 26.
Level 8
Direction: Horizontal
Sinks per sub-region: 23
Sub-region size: 50.5762 X 51.7062
[INFO CTS-0034] Segment length (rounded): 26.
Level 9
Direction: Vertical
Sinks per sub-region: 12
Sub-region size: 50.5762 X 25.8531
[INFO CTS-0034] Segment length (rounded): 12.
[INFO CTS-0032] Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035] Number of sinks covered: 5733.
[INFO CTS-0018] Created 131 clock buffers.
[INFO CTS-0012] Minimum number of buffers in the clock path: 22.
[INFO CTS-0013] Maximum number of buffers in the clock path: 22.
[INFO CTS-0015] Created 131 clock nets.
[INFO CTS-0016] Fanout distribution for the current clock = 4:1, 6:3, 8:3, 9:1, 10:5, 12:1, 16:1, 18:1..
[INFO CTS-0017] Max level of the clock tree: 4.
[INFO CTS-0018] Created 6931 clock buffers.
[INFO CTS-0012] Minimum number of buffers in the clock path: 27.
[INFO CTS-0013] Maximum number of buffers in the clock path: 28.
[INFO CTS-0015] Created 6931 clock nets.
[INFO CTS-0016] Fanout distribution for the current clock = 2:5, 3:23, 4:40, 5:35, 6:44, 7:58, 8:71, 9:68, 10:61, 11:79, 12:85, 13:87, 14:95, 15:144, 16:166, 17:247, 18:356, 19:478, 20:615, 21:769, 22:799, 23:707, 24:561, 25:356, 26:177, 27:75, 28:32, 29:5, 30:3, 34:1..
[INFO CTS-0017] Max level of the clock tree: 9.
[INFO CTS-0098] Clock net "clock"
[INFO CTS-0099] Sinks 151
[INFO CTS-0100] Leaf buffers 0
[INFO CTS-0101] Average sink wire length 2134.07 um
[INFO CTS-0102] Path depth 22 - 22
[INFO CTS-0207] Leaf load cells 5418
[INFO CTS-0098] Clock net "clock_regs"
[INFO CTS-0099] Sinks 123861
[INFO CTS-0100] Leaf buffers 5730
[INFO CTS-0101] Average sink wire length 1111.96 um
[INFO CTS-0102] Path depth 27 - 28
[INFO CTS-0207] Leaf load cells 5418
Took 19 seconds: clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO RSZ-0058] Using max wire length 162um.
[INFO RSZ-0047] Found 23 long wires.
[INFO RSZ-0048] Inserted 58 buffers in 23 nets.
Placement Analysis
---------------------------------
total displacement 28324.3 u
average displacement 0.0 u
max displacement 20.9 u
original HPWL 12655780.5 u
legalized HPWL 12795445.5 u
delta HPWL 1 %
repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 3266 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 3266 (0.00%) violating endpoints...
Iter | Removed | Resized | Inserted | Cloned | Pin | Area | WNS | TNS | Viol | Worst
| Buffers | Gates | Buffers | Gates | Swaps | | | | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0 | +0.0% | -2080.066 | -49214404.0 | 3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
10 | 0 | 3 | 3 | 0 | 5 | +0.0% | -2051.164 | -48283988.0 | 3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
20 | 0 | 4 | 13 | 0 | 9 | +0.0% | -1979.664 | -46568356.0 | 3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
[deleted]
1103* | 11 | 405 | 235 | 0 | 64 | +0.0% | -1787.770 | -43571736.0 | 438 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
1104* | 11 | 405 | 235 | 0 | 64 | +0.0% | -1787.770 | -43571736.0 | 437 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
1105* | 11 | 405 | 235 | 0 | 64 | +0.0% | -1787.770 | -43571736.0 | 436 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
final | 11 | 405 | 235 | 0 | 64 | +0.0% | -1787.770 | -43571736.0 | 436 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
--------------------------------------------------------------------------------------------------------------
[INFO RSZ-0059] Removed 11 buffers.
[INFO RSZ-0045] Inserted 124 buffers, 1 to split loads.
[INFO RSZ-0041] Resized 405 instances.
[INFO RSZ-0043] Swapped pins on 64 instances.
[WARNING RSZ-0062] Unable to repair all setup violations.
[INFO RSZ-0033] No hold violations found.
Took 2493 seconds: repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
Placement Analysis
---------------------------------
total displacement 245.6 u
average displacement 0.0 u
max displacement 3.7 u
original HPWL 12795543.8 u
legalized HPWL 12795727.9 u
delta HPWL 0 %
Elapsed time: 45:43.31[h:]min:sec. CPU time: user 15183.48 sys 22898.96 (1388%). Peak memory: 30825912KB.
@maliberty Could it be that there are hopeless hold violations that make the test case take forever and ultimately crash (in some cases)?
I wouldn't expect a crash unless you are running out of memory.
I wouldn't expect a crash unless you are running out of memory.
futile hold repair running out of memory?
Possibly but hard to say without know the job size, machine size, available swap, etc.
Possibly but hard to say without know the job size, machine size, available swap, etc.
It would be nice if hold repair gave up quickly on lost causes. I think hold repair is suceptible to hopeless cases when hold timing goes easy in a dense design to impossible.
Suggestion: examine this test case to see if it is an example of hold repair going from easy to impossible and see if there's some heuristic to error out early that this is never ever going to work.
I'm running the test case. CTS took just 66 sec but setup fixing is taking more than 2 hours even with 0% TNS target. This is a good testcase to improve timing repair runtime for both setup and hold.
perf top from repairing 1 endpoint.
repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
Iter | Removed | Resized | Inserted | Cloned | Pin | Area | WNS | TNS | Viol | Worst
| Buffers | Gates | Buffers | Gates | Swaps | | | | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0 | +0.0% | -2574.781 | -56408868.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
10 | 0 | 2 | 9 | 0 | 4 | +0.0% | -2517.836 | -55473472.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
20 | 0 | 2 | 21 | 0 | 8 | +0.0% | -2469.092 | -54803636.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
30 | 0 | 4 | 31 | 0 | 12 | +0.0% | -2404.652 | -53191576.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
40 | 0 | 4 | 38 | 0 | 19 | +0.0% | -2403.907 | -53097688.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
Overhead Shared Object Symbol
4,56% openroad [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::ClkInfo*, sta::Pin const*
4,46% openroad [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::Path*, sta::Edge*,
3,68% openroad [.] sta::TagGroup::hasTag(sta::Tag*) const
3,12% openroad [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
3,02% openroad [.] sta::VertexPathIterator::next()
3,00% openroad [.] odb::dbInst::getMaster() const
2,69% openroad [.] odb::dbObject::getObjectType() const
2,63% [kernel] [k] 0xffffffffb53ccaaa
2,02% openroad [.] sta::PathVisitor::visitEdge(sta::Pin const*, sta::Vertex*, sta::Edge*, sta::Pin const*, sta::Vertex*)
1,94% [kernel] [k] 0xffffffffb5d2f71b
1,89% libstdc++.so.6.0.34 [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
1,84% openroad [.] sta::RequiredVisitor::visitFromToPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::Tag*, sta::Pa
1,76% openroad [.] odb::dbObject::getId() const
1,63% openroad [.] sta::Debug::check(char const*, int) const
1,56% openroad [.] sta::TagGroup::pathIndex(sta::Tag*) const
1,55% openroad [.] sta::Graph::edge(unsigned int) const
1,40% openroad [.] std::__detail::_Map_base<sta::Tag*, std::pair<sta::Tag* const, unsigned long>, std::allocator<std::pair<sta::
1,37% openroad [.] sta::Search::findTag(sta::RiseFall const*, sta::PathAnalysisPt const*, sta::ClkInfo*, bool, sta::InputDelay*,
1,
@maliberty @precisionmoon Is this a smoking gun? It shouldn't take more than minutes to fix a single end-point, right?
If there is a text we can use here for deltaDebug, then we can whittle it down.
Perhaps it should never take more than, say 100, iterations to repair a single endpoint?
repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
Iter | Removed | Resized | Inserted | Cloned | Pin | Area | WNS | TNS | Viol | Worst
| Buffers | Gates | Buffers | Gates | Swaps | | | | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0 | +0.0% | -2574.781 | -56408868.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
10 | 0 | 2 | 9 | 0 | 4 | +0.0% | -2517.836 | -55473472.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
20 | 0 | 2 | 21 | 0 | 8 | +0.0% | -2469.092 | -54803636.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
30 | 0 | 4 | 31 | 0 | 12 | +0.0% | -2404.652 | -53191576.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
40 | 0 | 4 | 38 | 0 | 19 | +0.0% | -2403.907 | -53097688.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
So if we're repairing a single endpoint and the worst endpoint changes, then we've made the worst endpoint better than the second worst. Could that be a natural time to stop for repairing the worst end-point?
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
Iter | Removed | Resized | Inserted | Cloned | Pin | Area | WNS | TNS | Viol | Worst
| Buffers | Gates | Buffers | Gates | Swaps | | | | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0 | +0.0% | -2574.781 | -56408868.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
10 | 0 | 2 | 9 | 0 | 4 | +0.0% | -2517.836 | -55473472.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
20 | 0 | 2 | 21 | 0 | 8 | +0.0% | -2469.092 | -54803636.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
30 | 0 | 4 | 31 | 0 | 12 | +0.0% | -2404.652 | -53191576.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
40 | 0 | 4 | 38 | 0 | 19 | +0.0% | -2403.907 | -53097688.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
50 | 0 | 4 | 52 | 0 | 24 | +0.0% | -2398.125 | -52924164.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
60 | 0 | 4 | 67 | 0 | 29 | +0.0% | -2381.770 | -52846792.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
70 | 1 | 5 | 78 | 0 | 33 | +0.0% | -2385.023 | -53138332.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
@precisionmoon Could this just be an interesting edge case that is triggered by some unfortunate initial conditions? I'm bumping ORFS on MegaBoom to see if upgrading makes the problem magically go away.
https://github.com/The-OpenROAD-Project/megaboom/pull/244
Note! CI is out of commission on megaboom, so I'm going to post some results when I have them to the PR above.
So what does it mean to repair the worst end point?
Does it mean to repair the worst endpoint, which can change during repair, as seen below, until there is no further progress on the worst endpoint?
In that case, repairing the worst endpoint, surprisingly, could, in theory, repair all endpoints with negative slack?
repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
Iter | Removed | Resized | Inserted | Cloned | Pin | Area | WNS | TNS | Viol | Worst
| Buffers | Gates | Buffers | Gates | Swaps | | | | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0 | +0.0% | -2574.781 | -56408868.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
10 | 0 | 2 | 9 | 0 | 4 | +0.0% | -2517.836 | -55473472.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
20 | 0 | 2 | 21 | 0 | 8 | +0.0% | -2469.092 | -54803636.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
30 | 0 | 4 | 31 | 0 | 12 | +0.0% | -2404.652 | -53191576.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
40 | 0 | 4 | 38 | 0 | 19 | +0.0% | -2403.907 | -53097688.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
50 | 0 | 4 | 52 | 0 | 24 | +0.0% | -2398.125 | -52924164.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
60 | 0 | 4 | 67 | 0 | 29 | +0.0% | -2381.770 | -52846792.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
70 | 1 | 5 | 78 | 0 | 33 | +0.0% | -2385.023 | -53138332.0 | 24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
80 | 1 | 7 | 92 | 0 | 35 | +0.0% | -2361.230 | -52884480.0 | 24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
@precisionmoon Looks like changing initial conditions will make repair timing finish quickly. I believe there is something special about this exact test case and initial conditions that lead to repair timing taking forever, not a change in code as such.
Even with -repair_tns 0, we continue fixing since the WNS is not fully fixed. Currently, we terminate if there is no progress for 200 iterations, which may be too large. BTW, we have -skip_last_gasp option for repair_timing that should help with some runtime reduction.
Your profile data is helpful. We want to reduce timer updates, especially required time updates.
Added @povik for any additional insight on timer runtime
@povik, here's the vtunes data. Let's review together to improve runtime.