OpenROAD Improve incremental repair running times so that megaboom can be done in 24 hours

Description

repair_timing breaks the 24 hour budget for megaboom, so repair_timing has been disabled.

Note that repair_timing is not the biggest problem in megaboom(the clock period is ca. 3000ps now and I believe repair timing is a small fraction of that...), so this is a forward looking feature request to when macro placement has been addressed.

Untar https://drive.google.com/file/d/14-UYL5iUD1GWlL10sYCOPIB0jOpVYDzd/view?usp=sharing

./run-me-BoomTile-asap7-base.sh

[deleted]
[INFO GRT-0018] Total wirelength: 40146727 um
[INFO GRT-0014] Routed nets: 1976482
Perform buffer insertion...
[INFO RSZ-0058] Using max wire length 162um.
[after hours, still running]
[deleted]
repair_timing -verbose -repair_tns 0
[INFO RSZ-0094] Found 189600 endpoints with setup violations.
[INFO RSZ-0099] Repairing 1 out of 189600 (0.00%) violating endpoints...
   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |            | Endpts | Endpt
---------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 | -4099.470 | -410877536.0 | 189600 | core/iregister_read/exe_reg_rs1_data_0\[1\]$_DFF_P_/D
[deleted]
[28\]$_DFFE_PP_/D
     185* |       5 |      22 |       39 |     15 |    38 | -3812.590 | -359985248.0 | 189577 | lsu/ldq_31_bits_uop_debug_inst\[28\]$_DFFE_PP_/D
[still running after 10 hours or so, stopped]

Additional Context

No response

Sep 10 '24 19:09 oharboe

after 8 hours or so, the stack trace is the below...

Hmm.... throw is used to abandon things when searching for a solution, could this be done without exceptions to improve performance?

__cxa_throw (@__cxa_throw:3)
sta::DmpPi::evalDmpEqns() (.cold) (@sta::DmpPi::evalDmpEqns() (.cold):20)
sta::DmpAlg::findDriverParams(double) (@sta::DmpAlg::findDriverParams(double):543)
sta::DmpPi::gateDelaySlew(double&, double&) (@sta::DmpPi::gateDelaySlew(double&, double&):21)
sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*) (@sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*):92)
sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*) (@sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*):111)
sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::array<bool, 2ul>&) (@sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::array<bool, 2ul>&):71)
sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*) (@sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*):108)
sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*) (@sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*):31)
sta::FindVertexDelays::visit(sta::Vertex*) (@sta::FindVertexDelays::visit(sta::Vertex*):72)
sta::BfsIterator::visit(int, sta::VertexVisitor*) (@sta::BfsIterator::visit(int, sta::VertexVisitor*):72)
sta::GraphDelayCalc::findDelays(int) (@sta::GraphDelayCalc::findDelays(int):48)
sta::Sta::searchPreamble() (@sta::Sta::searchPreamble():25)
sta::Sta::findRequireds() (@sta::Sta::findRequireds():7)
rsz::RepairSetup::repairSetupLastGasp(rsz::OptoParams const&, int&) (@rsz::RepairSetup::repairSetupLastGasp(rsz::OptoParams const&, int&):437)
rsz::RepairSetup::repairSetup(float, double, int, bool, bool, bool, bool, bool) (@rsz::RepairSetup::repairSetup(float, double, int, bool, bool, bool, bool, bool):739)
_wrap_repair_setup (@_wrap_repair_setup:124)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)

Sep 11 '24 07:09 oharboe

@jeffng-or @precisionmoon @maliberty In addition to improving performance, can we add some logic to abandon futile repair timing automatically?

WNS is -3812.590ps for 3000ps clock period. Isn't repair_timing an exercise in futility at that point?

Sep 11 '24 08:09 oharboe

I continues as long as there is some progress, not based on the clock period.

Sep 11 '24 14:09 maliberty

Add an option to stop repairing at a certain threshold?

TIMING_REPAIR_UNTIL=-1300

(If TIMING_REPAIR_UNTIL is blank/not set, default, then behavior is unchanged compared to today)

For exploring placement & global route issues in megaboom, then running repair for 10 minutes until you get from -20000 to -1300 might be more than good enough to work on placement & global route issue?

Running until -1000ps might take many more hours... Eventually for a final run when all issues are resolved, it is probably is worth doing the last gasp repair timing...

Without timing repair, WNS=-20000. Even after a few iterations, a few minutes, it is enormously improved to -1300 or so.

A "Seconds" column would be useful, unless the number iterations is roughly proportional to time? (I don't have a sense of whether that is the case)

   Iter   | Removed | Resized | Inserted | Cloned |  Pin  |    WNS   |   TNS      |  Viol  | Worst
          | Buffers |  Gates  | Buffers  |  Gates | Swaps |          |            | Endpts | Endpt
---------------------------------------------------------------------------------------------------
        0 |       0 |       0 |        0 |      0 |     0 | -2195.436 | -114376600.0 | 165887 | core/int_issue_unit/io_dis_uops_0_ready_REG$_DFF_P_/D
       10 |       0 |       7 |        0 |      1 |     1 | -1346.351 | -80619768.0 | 165887 | core/int_issue_unit/io_dis_uops_0_ready_REG$_DFF_P_/D
       20 |       0 |       9 |        2 |      2 |     7 | -1300.067 | -77720360.0 | 165887 | core/int_issue_unit/io_dis_uops_0_ready_REG$_DFF_P_/D
       30 |       0 |      11 |        3 |      3 |    13 | -1288.946 | -77026352.0 | 165887 | core/iregister_read/exe_reg_rs1_data_0\[55\]$_DFF_P_/D

Oct 22 '24 07:10 oharboe

SETUP_SLACK_MARGIN/HOLD_SLACK_MARGIN already do this. They were intended for over-fixing but nothing prevents you from using them for under-fixing afaik.

Oct 22 '24 14:10 maliberty

Nice! I just need to learn how to use them then. Any examples?

Oct 22 '24 15:10 oharboe

microwatt though with a positive value

Oct 22 '24 16:10 maliberty

microwatt though with a positive value

Would setting a clock period that gives us a positive slack quickly in timing repair have the same effect?

megaboom has a 1200ps clock period and gets to negative slack -1200 relatively quickly, so I should see timing repair complete quickly if I set clock period to 1200+1200 = 2400ps?

Typo... https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/2497

Oct 22 '24 16:10 oharboe

Though... keeping ABC_CLOCK_PERIOD stable has the advantage of that we don't have to redo synthesis, whereas we could do a sweep of SETUP_SLACK_MARGIN/HOLD_SLACK_MARGIN. More easily anyway.

Oct 22 '24 17:10 oharboe

https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/2498

Oct 22 '24 17:10 oharboe

Would setting a clock period that gives us a positive slack quickly in timing repair have the same effect?

Yes.

Oct 22 '24 17:10 maliberty

Moot, use SETUP_SLACK_MARGIN with negative value to terminate retiming early

Oct 22 '24 17:10 oharboe

Improve incremental repair running times so that megaboom can be done in 24 hours

Description

Suggested Solution

Additional Context