OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

Repairing 1 endpoint in MegaBoom takes forever(40000 seconds) and sometimes crashes

Open oharboe opened this issue 7 months ago • 24 comments

Describe the bug

untar and run https://drive.google.com/file/d/1U_G1_G6m0huMM-uTx5yhC-nW28OyD7To/view?usp=sharing

This CTS used to run in a reasonable amount of time, now it crashes with a malloc error after ca. 40000s

Full log

ctslog.zip

iterations against WNS in repair:

Image

Expected Behavior

Finish in a reasonable amount of time and not crash

Environment

OpenROAD v2.0-21120-gc1c1e43ccf

To Reproduce

see above

Relevant log output


Screenshots

No response

Additional Context

No response

oharboe avatar May 05 '25 07:05 oharboe

Is it cts or just the cts step of the flow which contains other commands. A log snippet would help clarify.

maliberty avatar May 05 '25 13:05 maliberty

Is it cts or just the cts step of the flow which contains other commands. A log snippet would help clarify.

I am collecting data, but I suspect it will take days. If I cant make this actionable, I will close.

oharboe avatar May 05 '25 14:05 oharboe

So repair_timing is taking a 27000 seconds... This means that you can probably get into the slow part quickly. It would be interesting to know why repair_timing is taking so long.

    final |       0 |     182 |            0 |    +0.0% | -199.845 | -251912.531 | frontend.bpd.banked_predictors_1.tage.t.table_ext/W0_data[24]
--------------------------------------------------------------------------------------
[INFO RSZ-0032] Inserted 182 hold buffers.
Took 26619 seconds: repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
Placement Analysis
---------------------------------
total displacement        480.8 u
average displacement        0.0 u
max displacement           18.9 u
original HPWL        13223460.3 u
legalized HPWL       13223653.8 u
delta HPWL                    0 %


real    449m8,014s
user    448m43,595s
sys     0m16,404s

oharboe avatar May 05 '25 14:05 oharboe

Is this amount of tns seen before the recent slow down? How many iterations did it take?

maliberty avatar May 05 '25 14:05 maliberty

Is this amount of tns seen before the recent slow down? How many iterations did it take?

Don't have A/B information, still collecting information, it will be a few days. Waiting on a second sighting of that crash.

oharboe avatar May 05 '25 15:05 oharboe

Crash log, not actionable yet...

Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[INFO ORD-0030] Using 48 thread(s).
source external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/platforms/asap7/setRC.tcl
clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO CTS-0050] Root buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0051] Sink buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0052] The following clock buffers will be used for CTS:
                    BUFx24_ASAP7_75t_R
[INFO CTS-0049] Characterization buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0007] Net "clock" found for clock "clock".
[INFO CTS-0011]  Clock net "clock" for macros has 151 sinks.
[INFO CTS-0011]  Clock net "clock_regs" for registers has 118443 sinks.
[INFO CTS-0008] TritonCTS found 2 clock nets.
[INFO CTS-0097] Characterization used 1 buffer(s) types.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock.
[INFO CTS-0028]  Total number of sinks: 151.
[INFO CTS-0090]  Sinks will be clustered based on buffer max cap.
[INFO CTS-0030]  Number of static layers: 0.
[INFO CTS-0020]  Wire segment unit: 1350  dbu (1 um).
[INFO CTS-0023]  Original sink region: [(68070, 45180), (1216583, 1227852)].
[INFO CTS-0024]  Normalized sink region: [(50.4222, 33.4667), (901.173, 909.52)].
[INFO CTS-0025]     Width:  850.7504.
[INFO CTS-0026]     Height: 876.0533.
 Level 1
    Direction: Vertical
    Sinks per sub-region: 76
    Sub-region size: 850.7504 X 438.0267
[INFO CTS-0034]     Segment length (rounded): 220.
 Level 2
    Direction: Horizontal
    Sinks per sub-region: 38
    Sub-region size: 425.3752 X 438.0267
[INFO CTS-0034]     Segment length (rounded): 212.
 Level 3
    Direction: Vertical
    Sinks per sub-region: 19
    Sub-region size: 425.3752 X 219.0133
[INFO CTS-0034]     Segment length (rounded): 110.
 Level 4
    Direction: Horizontal
    Sinks per sub-region: 10
    Sub-region size: 212.6876 X 219.0133
[INFO CTS-0034]     Segment length (rounded): 106.
[INFO CTS-0032]  Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035]  Number of sinks covered: 151.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock_regs.
[INFO CTS-0028]  Total number of sinks: 118443.
[INFO CTS-0090]  Sinks will be clustered based on buffer max cap.
[INFO CTS-0030]  Number of static layers: 0.
[INFO CTS-0020]  Wire segment unit: 1350  dbu (1 um).
[INFO CTS-0206] Best clustering solution was found from clustering size of 10 and clustering diameter of 50.
[INFO CTS-0019]  Total number of sinks after clustering: 5658.
[INFO CTS-0024]  Normalized sink region: [(61.5519, 10.2067), (852.212, 772.82)].
[INFO CTS-0025]     Width:  790.6600.
[INFO CTS-0026]     Height: 762.6133.
 Level 1
    Direction: Horizontal
    Sinks per sub-region: 2829
    Sub-region size: 395.3300 X 762.6133
[INFO CTS-0034]     Segment length (rounded): 198.
 Level 2
    Direction: Vertical
    Sinks per sub-region: 1415
    Sub-region size: 395.3300 X 381.3067
[INFO CTS-0034]     Segment length (rounded): 190.
 Level 3
    Direction: Horizontal
    Sinks per sub-region: 708
    Sub-region size: 197.6650 X 381.3067
[INFO CTS-0034]     Segment length (rounded): 98.
 Level 4
    Direction: Vertical
    Sinks per sub-region: 354
    Sub-region size: 197.6650 X 190.6533
[INFO CTS-0034]     Segment length (rounded): 96.
 Level 5
    Direction: Horizontal
    Sinks per sub-region: 177
    Sub-region size: 98.8325 X 190.6533
[INFO CTS-0034]     Segment length (rounded): 50.
 Level 6
    Direction: Vertical
    Sinks per sub-region: 89
    Sub-region size: 98.8325 X 95.3267
[INFO CTS-0034]     Segment length (rounded): 48.
 Level 7
    Direction: Horizontal
    Sinks per sub-region: 45
    Sub-region size: 49.4162 X 95.3267
[INFO CTS-0034]     Segment length (rounded): 24.
 Level 8
    Direction: Vertical
    Sinks per sub-region: 23
    Sub-region size: 49.4162 X 47.6633
[INFO CTS-0034]     Segment length (rounded): 24.
 Level 9
    Direction: Horizontal
    Sinks per sub-region: 12
    Sub-region size: 24.7081 X 47.6633
[INFO CTS-0034]     Segment length (rounded): 12.
[INFO CTS-0032]  Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035]  Number of sinks covered: 5658.
[INFO CTS-0018]     Created 119 clock buffers.
[INFO CTS-0012]     Minimum number of buffers in the clock path: 21.
[INFO CTS-0013]     Maximum number of buffers in the clock path: 21.
[INFO CTS-0015]     Created 119 clock nets.
[INFO CTS-0016]     Fanout distribution for the current clock = 4:2, 5:1, 6:2, 7:1, 9:1, 10:4, 11:1, 14:1, 15:3..
[INFO CTS-0017]     Max level of the clock tree: 4.
[INFO CTS-0018]     Created 6791 clock buffers.
[INFO CTS-0012]     Minimum number of buffers in the clock path: 26.
[INFO CTS-0013]     Maximum number of buffers in the clock path: 26.
[INFO CTS-0015]     Created 6791 clock nets.
[INFO CTS-0016]     Fanout distribution for the current clock = 1:3, 2:12, 3:29, 4:40, 5:36, 6:41, 7:34, 8:50, 9:62, 10:53, 11:62, 12:76, 13:101, 14:110, 15:132, 16:195, 17:257, 18:351, 19:434, 20:601, 21:686, 22:701, 23:698, 24:562, 25:433, 26:227, 27:116, 28:38, 29:22, 30:5, 31:2, 32:1..
[INFO CTS-0017]     Max level of the clock tree: 9.
[INFO CTS-0098] Clock net "clock"
[INFO CTS-0099]  Sinks 151
[INFO CTS-0100]  Leaf buffers 0
[INFO CTS-0101]  Average sink wire length 2164.53 um
[INFO CTS-0102]  Path depth 21 - 21
[INFO CTS-0207]  Leaf load cells 5322
[INFO CTS-0098] Clock net "clock_regs"
[INFO CTS-0099]  Sinks 123765
[INFO CTS-0100]  Leaf buffers 5658
[INFO CTS-0101]  Average sink wire length 990.96 um
[INFO CTS-0102]  Path depth 25 - 26
[INFO CTS-0207]  Leaf load cells 5322
Took 22 seconds: clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO RSZ-0058] Using max wire length 162um.
[INFO RSZ-0047] Found 28 long wires.
[INFO RSZ-0048] Inserted 93 buffers in 28 nets.
Placement Analysis
---------------------------------
total displacement      34069.3 u
average displacement        0.0 u
max displacement          183.2 u
original HPWL        13067750.8 u
legalized HPWL       13221595.6 u
delta HPWL                    1 %

repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
malloc(): corrupted top size
Signal 6 received
Stack trace:

oharboe avatar May 05 '25 17:05 oharboe

Unsure how to proceed with debugging, because it crashes only in the bazel-orfs context. What is stranger is that bazel doesn't stop upon signal 6 from the underlying process, bazel is still waiting for the job to complete, but no CPU is used.

bazel doesn't have some evil power to inject malloc(): corrupted top size, but that doesn't tell me how to make it reproducible and actionable...

oharboe avatar May 05 '25 17:05 oharboe

This is a successful run from docker image v3.0-2863-g22e02291

It took ca. 45 minutes. Of course, the input to CTS will be different. I think it is worth having a look at why the test case is taking a long time.

Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[INFO ORD-0030] Using 44 thread(s).
mkdir -p bazel-out/k8-fastbuild/bin/objects/asap7/BoomTile/base
external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/scripts/flow.sh 4_1_cts cts
Running cts.tcl, stage 4_1_cts
source external/bazel-orfs~~orfs_repositories~docker_orfs/OpenROAD-flow-scripts/flow/platforms/asap7/setRC.tcl
clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO CTS-0050] Root buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0051] Sink buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0052] The following clock buffers will be used for CTS:
                    BUFx24_ASAP7_75t_R
[INFO CTS-0049] Characterization buffer is BUFx24_ASAP7_75t_R.
[INFO CTS-0007] Net "clock" found for clock "clock".
[INFO CTS-0011]  Clock net "clock" for macros has 151 sinks.
[INFO CTS-0011]  Clock net "clock_regs" for registers has 118443 sinks.
[INFO CTS-0008] TritonCTS found 2 clock nets.
[INFO CTS-0097] Characterization used 1 buffer(s) types.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock.
[INFO CTS-0028]  Total number of sinks: 151.
[INFO CTS-0090]  Sinks will be clustered based on buffer max cap.
[INFO CTS-0030]  Number of static layers: 0.
[INFO CTS-0020]  Wire segment unit: 1350  dbu (1 um).
[INFO CTS-0023]  Original sink region: [(33463, 45084), (1216583, 1217676)].
[INFO CTS-0024]  Normalized sink region: [(24.7874, 33.3956), (901.173, 901.982)].
[INFO CTS-0025]     Width:  876.3852.
[INFO CTS-0026]     Height: 868.5867.
 Level 1
    Direction: Horizontal
    Sinks per sub-region: 76
    Sub-region size: 438.1926 X 868.5867
[INFO CTS-0034]     Segment length (rounded): 220.
 Level 2
    Direction: Vertical
    Sinks per sub-region: 38
    Sub-region size: 438.1926 X 434.2933
[INFO CTS-0034]     Segment length (rounded): 218.
 Level 3
    Direction: Horizontal
    Sinks per sub-region: 19
    Sub-region size: 219.0963 X 434.2933
[INFO CTS-0034]     Segment length (rounded): 110.
 Level 4
    Direction: Vertical
    Sinks per sub-region: 10
    Sub-region size: 219.0963 X 217.1467
[INFO CTS-0034]     Segment length (rounded): 108.
[INFO CTS-0032]  Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035]  Number of sinks covered: 151.
[INFO CTS-0201] 72 blockages from hard placement blockages and placed macros will be used.
[INFO CTS-0027] Generating H-Tree topology for net clock_regs.
[INFO CTS-0028]  Total number of sinks: 118443.
[INFO CTS-0090]  Sinks will be clustered based on buffer max cap.
[INFO CTS-0030]  Number of static layers: 0.
[INFO CTS-0020]  Wire segment unit: 1350  dbu (1 um).
[INFO CTS-0206] Best clustering solution was found from clustering size of 20 and clustering diameter of 50.
[INFO CTS-0019]  Total number of sinks after clustering: 5733.
[INFO CTS-0024]  Normalized sink region: [(61.4185, 10.4667), (870.637, 837.767)].
[INFO CTS-0025]     Width:  809.2189.
[INFO CTS-0026]     Height: 827.3000.
 Level 1
    Direction: Vertical
    Sinks per sub-region: 2867
    Sub-region size: 809.2189 X 413.6500
[INFO CTS-0034]     Segment length (rounded): 206.
 Level 2
    Direction: Horizontal
    Sinks per sub-region: 1434
    Sub-region size: 404.6095 X 413.6500
[INFO CTS-0034]     Segment length (rounded): 202.
 Level 3
    Direction: Vertical
    Sinks per sub-region: 717
    Sub-region size: 404.6095 X 206.8250
[INFO CTS-0034]     Segment length (rounded): 104.
 Level 4
    Direction: Horizontal
    Sinks per sub-region: 359
    Sub-region size: 202.3047 X 206.8250
[INFO CTS-0034]     Segment length (rounded): 102.
 Level 5
    Direction: Vertical
    Sinks per sub-region: 180
    Sub-region size: 202.3047 X 103.4125
[INFO CTS-0034]     Segment length (rounded): 52.
 Level 6
    Direction: Horizontal
    Sinks per sub-region: 90
    Sub-region size: 101.1524 X 103.4125
[INFO CTS-0034]     Segment length (rounded): 50.
 Level 7
    Direction: Vertical
    Sinks per sub-region: 45
    Sub-region size: 101.1524 X 51.7062
[INFO CTS-0034]     Segment length (rounded): 26.
 Level 8
    Direction: Horizontal
    Sinks per sub-region: 23
    Sub-region size: 50.5762 X 51.7062
[INFO CTS-0034]     Segment length (rounded): 26.
 Level 9
    Direction: Vertical
    Sinks per sub-region: 12
    Sub-region size: 50.5762 X 25.8531
[INFO CTS-0034]     Segment length (rounded): 12.
[INFO CTS-0032]  Stop criterion found. Max number of sinks is 15.
[INFO CTS-0035]  Number of sinks covered: 5733.
[INFO CTS-0018]     Created 131 clock buffers.
[INFO CTS-0012]     Minimum number of buffers in the clock path: 22.
[INFO CTS-0013]     Maximum number of buffers in the clock path: 22.
[INFO CTS-0015]     Created 131 clock nets.
[INFO CTS-0016]     Fanout distribution for the current clock = 4:1, 6:3, 8:3, 9:1, 10:5, 12:1, 16:1, 18:1..
[INFO CTS-0017]     Max level of the clock tree: 4.
[INFO CTS-0018]     Created 6931 clock buffers.
[INFO CTS-0012]     Minimum number of buffers in the clock path: 27.
[INFO CTS-0013]     Maximum number of buffers in the clock path: 28.
[INFO CTS-0015]     Created 6931 clock nets.
[INFO CTS-0016]     Fanout distribution for the current clock = 2:5, 3:23, 4:40, 5:35, 6:44, 7:58, 8:71, 9:68, 10:61, 11:79, 12:85, 13:87, 14:95, 15:144, 16:166, 17:247, 18:356, 19:478, 20:615, 21:769, 22:799, 23:707, 24:561, 25:356, 26:177, 27:75, 28:32, 29:5, 30:3, 34:1..
[INFO CTS-0017]     Max level of the clock tree: 9.
[INFO CTS-0098] Clock net "clock"
[INFO CTS-0099]  Sinks 151
[INFO CTS-0100]  Leaf buffers 0
[INFO CTS-0101]  Average sink wire length 2134.07 um
[INFO CTS-0102]  Path depth 22 - 22
[INFO CTS-0207]  Leaf load cells 5418
[INFO CTS-0098] Clock net "clock_regs"
[INFO CTS-0099]  Sinks 123861
[INFO CTS-0100]  Leaf buffers 5730
[INFO CTS-0101]  Average sink wire length 1111.96 um
[INFO CTS-0102]  Path depth 27 - 28
[INFO CTS-0207]  Leaf load cells 5418
Took 19 seconds: clock_tree_synthesis -sink_clustering_enable -balance_levels
[INFO RSZ-0058] Using max wire length 162um.
[INFO RSZ-0047] Found 23 long wires.
[INFO RSZ-0048] Inserted 58 buffers in 23 nets.
Placement Analysis
---------------------------------
total displacement      28324.3 u
average displacement        0.0 u
max displacement           20.9 u
original HPWL        12655780.5 u
legalized HPWL       12795445.5 u
delta HPWL                    1 %

repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 3266 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 3266 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |   Area   |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |          |            | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 |    +0.0% | -2080.066 | -49214404.0 |   3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       3 |        3 |      0 |     5 |    +0.0% | -2051.164 | -48283988.0 |   3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       4 |       13 |      0 |     9 |    +0.0% | -1979.664 | -46568356.0 |   3266 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
[deleted]
    1103* |      11 |     405 |      235 |      0 |    64 |    +0.0% | -1787.770 | -43571736.0 |    438 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
    1104* |      11 |     405 |      235 |      0 |    64 |    +0.0% | -1787.770 | -43571736.0 |    437 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
    1105* |      11 |     405 |      235 |      0 |    64 |    +0.0% | -1787.770 | -43571736.0 |    436 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
    final |      11 |     405 |      235 |      0 |    64 |    +0.0% | -1787.770 | -43571736.0 |    436 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
--------------------------------------------------------------------------------------------------------------
[INFO RSZ-0059] Removed 11 buffers.
[INFO RSZ-0045] Inserted 124 buffers, 1 to split loads.
[INFO RSZ-0041] Resized 405 instances.
[INFO RSZ-0043] Swapped pins on 64 instances.
[WARNING RSZ-0062] Unable to repair all setup violations.
[INFO RSZ-0033] No hold violations found.
Took 2493 seconds: repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
Placement Analysis
---------------------------------
total displacement        245.6 u
average displacement        0.0 u
max displacement            3.7 u
original HPWL        12795543.8 u
legalized HPWL       12795727.9 u
delta HPWL                    0 %

Elapsed time: 45:43.31[h:]min:sec. CPU time: user 15183.48 sys 22898.96 (1388%). Peak memory: 30825912KB.

oharboe avatar May 05 '25 20:05 oharboe

@maliberty Could it be that there are hopeless hold violations that make the test case take forever and ultimately crash (in some cases)?

oharboe avatar May 05 '25 20:05 oharboe

I wouldn't expect a crash unless you are running out of memory.

maliberty avatar May 05 '25 20:05 maliberty

I wouldn't expect a crash unless you are running out of memory.

futile hold repair running out of memory?

oharboe avatar May 05 '25 21:05 oharboe

Possibly but hard to say without know the job size, machine size, available swap, etc.

maliberty avatar May 05 '25 21:05 maliberty

Possibly but hard to say without know the job size, machine size, available swap, etc.

It would be nice if hold repair gave up quickly on lost causes. I think hold repair is suceptible to hopeless cases when hold timing goes easy in a dense design to impossible.

Suggestion: examine this test case to see if it is an example of hold repair going from easy to impossible and see if there's some heuristic to error out early that this is never ever going to work.

oharboe avatar May 06 '25 05:05 oharboe

I'm running the test case. CTS took just 66 sec but setup fixing is taking more than 2 hours even with 0% TNS target. This is a good testcase to improve timing repair runtime for both setup and hold.

precisionmoon avatar May 07 '25 21:05 precisionmoon

perf top from repairing 1 endpoint.

repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |   Area   |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |          |            | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 |    +0.0% | -2574.781 | -56408868.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       2 |        9 |      0 |     4 |    +0.0% | -2517.836 | -55473472.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       2 |       21 |      0 |     8 |    +0.0% | -2469.092 | -54803636.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       30 |       0 |       4 |       31 |      0 |    12 |    +0.0% | -2404.652 | -53191576.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       40 |       0 |       4 |       38 |      0 |    19 |    +0.0% | -2403.907 | -53097688.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
Overhead  Shared Object                           Symbol
   4,56%  openroad                                [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::ClkInfo*, sta::Pin const*
   4,46%  openroad                                [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::Path*, sta::Edge*,
   3,68%  openroad                                [.] sta::TagGroup::hasTag(sta::Tag*) const
   3,12%  openroad                                [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
   3,02%  openroad                                [.] sta::VertexPathIterator::next()
   3,00%  openroad                                [.] odb::dbInst::getMaster() const
   2,69%  openroad                                [.] odb::dbObject::getObjectType() const
   2,63%  [kernel]                                [k] 0xffffffffb53ccaaa
   2,02%  openroad                                [.] sta::PathVisitor::visitEdge(sta::Pin const*, sta::Vertex*, sta::Edge*, sta::Pin const*, sta::Vertex*)
   1,94%  [kernel]                                [k] 0xffffffffb5d2f71b
   1,89%  libstdc++.so.6.0.34                     [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
   1,84%  openroad                                [.] sta::RequiredVisitor::visitFromToPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::Tag*, sta::Pa
   1,76%  openroad                                [.] odb::dbObject::getId() const
   1,63%  openroad                                [.] sta::Debug::check(char const*, int) const
   1,56%  openroad                                [.] sta::TagGroup::pathIndex(sta::Tag*) const
   1,55%  openroad                                [.] sta::Graph::edge(unsigned int) const
   1,40%  openroad                                [.] std::__detail::_Map_base<sta::Tag*, std::pair<sta::Tag* const, unsigned long>, std::allocator<std::pair<sta::
   1,37%  openroad                                [.] sta::Search::findTag(sta::RiseFall const*, sta::PathAnalysisPt const*, sta::ClkInfo*, bool, sta::InputDelay*,
   1,

oharboe avatar May 08 '25 05:05 oharboe

@maliberty @precisionmoon Is this a smoking gun? It shouldn't take more than minutes to fix a single end-point, right?

oharboe avatar May 08 '25 05:05 oharboe

If there is a text we can use here for deltaDebug, then we can whittle it down.

Perhaps it should never take more than, say 100, iterations to repair a single endpoint?

repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |   Area   |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |          |            | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 |    +0.0% | -2574.781 | -56408868.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       2 |        9 |      0 |     4 |    +0.0% | -2517.836 | -55473472.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       2 |       21 |      0 |     8 |    +0.0% | -2469.092 | -54803636.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       30 |       0 |       4 |       31 |      0 |    12 |    +0.0% | -2404.652 | -53191576.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       40 |       0 |       4 |       38 |      0 |    19 |    +0.0% | -2403.907 | -53097688.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D

oharboe avatar May 08 '25 05:05 oharboe

So if we're repairing a single endpoint and the worst endpoint changes, then we've made the worst endpoint better than the second worst. Could that be a natural time to stop for repairing the worst end-point?

[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |   Area   |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |          |            | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 |    +0.0% | -2574.781 | -56408868.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       2 |        9 |      0 |     4 |    +0.0% | -2517.836 | -55473472.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       2 |       21 |      0 |     8 |    +0.0% | -2469.092 | -54803636.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       30 |       0 |       4 |       31 |      0 |    12 |    +0.0% | -2404.652 | -53191576.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       40 |       0 |       4 |       38 |      0 |    19 |    +0.0% | -2403.907 | -53097688.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       50 |       0 |       4 |       52 |      0 |    24 |    +0.0% | -2398.125 | -52924164.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       60 |       0 |       4 |       67 |      0 |    29 |    +0.0% | -2381.770 | -52846792.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       70 |       1 |       5 |       78 |      0 |    33 |    +0.0% | -2385.023 | -53138332.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D

oharboe avatar May 08 '25 05:05 oharboe

@precisionmoon Could this just be an interesting edge case that is triggered by some unfortunate initial conditions? I'm bumping ORFS on MegaBoom to see if upgrading makes the problem magically go away.

https://github.com/The-OpenROAD-Project/megaboom/pull/244

Note! CI is out of commission on megaboom, so I'm going to post some results when I have them to the PR above.

oharboe avatar May 08 '25 06:05 oharboe

So what does it mean to repair the worst end point?

Does it mean to repair the worst endpoint, which can change during repair, as seen below, until there is no further progress on the worst endpoint?

In that case, repairing the worst endpoint, surprisingly, could, in theory, repair all endpoints with negative slack?

repair_timing -verbose -setup_margin -1300 -hold_margin -200 -repair_tns 0
[INFO RSZ-0094] Found 24253 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 24253 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |   Area   |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |          |            | Endpts | Endpt
--------------------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 |    +0.0% | -2574.781 | -56408868.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       2 |        9 |      0 |     4 |    +0.0% | -2517.836 | -55473472.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       2 |       21 |      0 |     8 |    +0.0% | -2469.092 | -54803636.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       30 |       0 |       4 |       31 |      0 |    12 |    +0.0% | -2404.652 | -53191576.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       40 |       0 |       4 |       38 |      0 |    19 |    +0.0% | -2403.907 | -53097688.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       50 |       0 |       4 |       52 |      0 |    24 |    +0.0% | -2398.125 | -52924164.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       60 |       0 |       4 |       67 |      0 |    29 |    +0.0% | -2381.770 | -52846792.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       70 |       1 |       5 |       78 |      0 |    33 |    +0.0% | -2385.023 | -53138332.0 |  24253 | core.FpPipeline.fp_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D
       80 |       1 |       7 |       92 |      0 |    35 |    +0.0% | -2361.230 | -52884480.0 |  24253 | core.int_issue_unit.io_dis_uops_0_ready_REG$_DFF_P_/D

oharboe avatar May 08 '25 06:05 oharboe

@precisionmoon Looks like changing initial conditions will make repair timing finish quickly. I believe there is something special about this exact test case and initial conditions that lead to repair timing taking forever, not a change in code as such.

oharboe avatar May 08 '25 14:05 oharboe

Even with -repair_tns 0, we continue fixing since the WNS is not fully fixed. Currently, we terminate if there is no progress for 200 iterations, which may be too large. BTW, we have -skip_last_gasp option for repair_timing that should help with some runtime reduction.

Your profile data is helpful. We want to reduce timer updates, especially required time updates.

precisionmoon avatar May 09 '25 23:05 precisionmoon

Added @povik for any additional insight on timer runtime

precisionmoon avatar May 12 '25 18:05 precisionmoon

@povik, here's the vtunes data. Let's review together to improve runtime.

precisionmoon avatar May 28 '25 20:05 precisionmoon