OpenROAD icon indicating copy to clipboard operation
OpenROAD copied to clipboard

Speed up opening of large designs grt::have_routes and sta::Sta::findPathEnds()

Open oharboe opened this issue 8 months ago • 12 comments

Description

untar and run https://drive.google.com/file/d/1J59T6hj0Bc7tRdL7XFq5bjGwDrNfEjJ0/view?usp=sharing

$ time ./run-me-BoomTile-asap7-base.sh
real	2m57,217s
user	2m51,280s
sys	0m5,876s

Some perf top snapshots:

Overhead  Shared Object                            Symbol
   8,89%  openroad                                 [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tuple<int, int, int> const, boos
   8,44%  openroad                                 [.] grt::GlobalRouter::adjustTransitionLayers(std::vector<int, std::allocator<int> > const&, std::map<i
   6,67%  openroad                                 [.] odb::dbITerm::getMTerm() const
   6,61%  [kernel]                                 [k] 0xffffffff9c038f80
   3,81%  openroad                                 [.] grt::FastRouteCore::initBlockedIntervals(std::vector<int, std::allocator<int> >&)
   3,16%  openroad                                 [.] grt::FastRouteCore::getEdgeCapacity(int, int, int, int, int)
   2,54%  libstdc++.so.6.0.33                      [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
   2,45%  openroad                                 [.] odb::dbObject::getObjectType() const
   1,92%  libc.so.6                                [.] _int_malloc
   1,88%  openroad                                 [.] odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape> >&)
   1,83%  openroad                                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
   1,80%  [kernel]                                 [k] 0xffffffff9b6813aa
   1,37%  openroad                                 [.] sta::Graph::makeEdge(sta::Vertex*, sta::Vertex*, sta::TimingArcSet*)
   1,35%  openroad                                 [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned short, bool)
   1,29%  openroad                                 [.] boost::icl::interval_base_set<boost::icl::interval_set<int, std::less, boost::icl::discrete_interva
   1,21%  openroad                                 [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned short, bool) [clone .constprop.
   1,13%  openroad                                 [.] grt::GlobalRouter::computeUserLayerAdjustments(int)
Overhead  Shared Object                            Symbol
   6,80%  [kernel]                                 [k] 0xffffffff9c038f80
   5,60%  openroad                                 [.] odb::dbObject::getId() const
   5,21%  openroad                                 [.] odb::dbITerm::getMTerm() const
   5,17%  openroad                                 [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<grt::Pin> > const&, std::vect
   3,92%  openroad                                 [.] odb::dbObject::getObjectType() const
   2,88%  openroad                                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
   1,61%  openroad                                 [.] odb::dbITerm::getInst() const
   1,58%  openroad                                 [.] odb::compare_by_id(odb::dbObject const*, odb::dbObject const*)
   1,58%  openroad                                 [.] sta::Table2::findValue(float, float, float) const
   1,48%  openroad                                 [.] std::map<grt::RoutePt, int, std::less<grt::RoutePt>, std::allocator<std::pair<grt::RoutePt const, i
   1,28%  openroad                                 [.] sta::LibertyPort::capacitance(sta::RiseFall const*, sta::MinMax const*, sta::OperatingConditions co
   1,25%  openroad                                 [.] sta::VertexOutEdgeIterator::next()
   1,07%  libstdc++.so.6.0.33                      [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_
   1,05%  libc.so.6                                [.] malloc
Overhead  Shared Object                            Symbol
   5,19%  [kernel]                                 [k] 0xffffffff9c038f80
   4,22%  openroad                                 [.] odb::dbObject::getObjectType() const
   3,89%  openroad                                 [.] odb::dbObject::getId() const
   3,83%  openroad                                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
   3,62%  openroad                                 [.] sta::TagGroupBldr::tagMatchArrival(sta::Tag*, sta::Tag*&, float&, int&) const
   3,19%  openroad                                 [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::ClkInfo*, sta::
   2,76%  openroad                                 [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::PathVerte
   2,67%  openroad                                 [.] sta::dbNetwork::id(sta::Pin const*) const
   2,30%  openroad                                 [.] odb::dbInst::getMaster() const
   1,94%  openroad                                 [.] odb::dbITerm::getMTerm() const
   1,63%  openroad                                 [.] sta::Debug::check(char const*, int) const

Some stack traces, grt::have_routes takes quite a bit of time:

odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape>>&) (@odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape>>&):68)
grt::GlobalRouter::findNetsObstructions(odb::Rect&) (@grt::GlobalRouter::findNetsObstructions(odb::Rect&):189)
grt::GlobalRouter::computeObstructionsAdjustments() (@grt::GlobalRouter::computeObstructionsAdjustments():47)
grt::GlobalRouter::applyAdjustments(int, int) (@grt::GlobalRouter::applyAdjustments(int, int):28)
grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():129)
grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16)
grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)
_wrap_have_routes (@_wrap_have_routes:25)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)

There are some paths with many levels of logic since synthesis doesn't do retiming:

sta::Edge::isDisabledConstraint() const (@sta::Edge::isDisabledConstraint() const:16)
sta::SearchPredNonLatch2::searchThru(sta::Edge*) (@sta::SearchPredNonLatch2::searchThru(sta::Edge*):18)
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&):113)
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int, 
sta::Vector<sta::Edge*>&):144)
[hundreds of recursions]
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&):144)
sta::Levelize::levelize() (@sta::Levelize::levelize():88)
sta::Sta::ensureClkNetwork() (@sta::Sta::ensureClkNetwork():17)
sta::dbSta::findClkNets() (@sta::dbSta::findClkNets():18)
grt::GlobalRouter::initClockNets() (@grt::GlobalRouter::initClockNets():77)
grt::GlobalRouter::findNets() (@grt::GlobalRouter::findNets():18)
grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():77)
grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16)
grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)
_wrap_have_routes (@_wrap_have_routes:25)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)

After grt::have_routes,

odb::dbITerm::getMTerm() const (@odb::dbITerm::getMTerm() const:49)
sta::dbNetwork::port(sta::Pin const*) const (@sta::dbNetwork::port(sta::Pin const*) const:21)
sta::Network::libertyPort(sta::Pin const*) const (@sta::Network::libertyPort(sta::Pin const*) const:8)
sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary const*, sta::RiseFall const*, float&, float&) (@sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary const*, sta::RiseFall const*, float&, float&):25)
sta::LumpedCapDelayCalc::makeResult(sta::LibertyLibrary const*, sta::RiseFall const*, float, float, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&) (@sta::LumpedCapDelayCalc::makeResult(sta::LibertyLibrary const*, sta::RiseFall const*, float, float, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&):47)
sta::LumpedCapDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*) (@sta::LumpedCapDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*):70)
sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*) (@sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*):167)
sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):111)
sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, std::array<bool, 2ul>&) (@sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, std::array<bool, 2ul>&):60)
sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):145)
sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):42)
sta::GraphDelayCalc::findVertexDelay(sta::Vertex*, sta::ArcDelayCalc*, bool) (@sta::GraphDelayCalc::findVertexDelay(sta::Vertex*, sta::ArcDelayCalc*, bool):110)
sta::BfsIterator::visit(int, sta::VertexVisitor*) (@sta::BfsIterator::visit(int, sta::VertexVisitor*):72)
sta::GraphDelayCalc::findDelays(int) (@sta::GraphDelayCalc::findDelays(int):48)
sta::Sta::searchPreamble() (@sta::Sta::searchPreamble():27)
sta::Sta::findPathEnds(sta::ExceptionFrom*, sta::Vector<sta::ExceptionThru*>*, sta::ExceptionTo*, bool, sta::Corner const*, sta::MinMaxAll const*, int, int, bool, float, float, bool, sta::Set<char const*, sta::CharPtrLess>*, bool, bool, bool, bool, bool, bool) (@sta::Sta::findPathEnds(sta::ExceptionFrom*, sta::Vector<sta::ExceptionThru*>*, sta::ExceptionTo*, bool, sta::Corner const*, sta::MinMaxAll const*, int, int, bool, float, float, bool, sta::Set<char const*, sta::CharPtrLess>*, bool, bool, bool, bool, bool, bool):46)
_wrap_find_path_ends (@_wrap_find_path_ends:288)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)

Suggested Solution

Look for some low hanging fruit to optimize by gdb suspend.

Additional Context

No response

oharboe avatar Apr 11 '25 17:04 oharboe

There are two different problems rolled into this one issue. @eder-matheus to look at have_routes.

maliberty avatar Apr 11 '25 19:04 maliberty

@maliberty FYI, before and after sta::Levelize() improvement, no change. I had this test case ready to go so I thought I would try it after sta::Levelize() improvement.

oyvind@corona:~/OpenROAD-flow-scripts/foo$ time ./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-20670-g4939290b5 
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
read_db results/asap7/BoomTile/base/5_1_grt.odb
Took 8 seconds: read_db results/asap7/BoomTile/base/5_1_grt.odb
GUI_TIMING=1 reading timing, takes a little while for large designs...
read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
Took 10 seconds: read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
grt::have_routes
Took 48 seconds: grt::have_routes
No global routing results available, skipping estimate_parasitics
Load .//reports/asap7/BoomTile/base/congestion.rpt for details
find_timing_paths
Took 71 seconds: find_timing_paths

real	2m27,613s
user	2m23,664s
sys	0m3,944s
oyvind@corona:~/OpenROAD-flow-scripts/foo$ which openroad
/home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
oyvind@corona:~/OpenROAD-flow-scripts/foo$ time ./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-20763-g3a77ad118 
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
read_db results/asap7/BoomTile/base/5_1_grt.odb
Took 9 seconds: read_db results/asap7/BoomTile/base/5_1_grt.odb
GUI_TIMING=1 reading timing, takes a little while for large designs...
read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
Took 9 seconds: read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
grt::have_routes
Took 47 seconds: grt::have_routes
No global routing results available, skipping estimate_parasitics
Load .//reports/asap7/BoomTile/base/congestion.rpt for details
find_timing_paths
Took 71 seconds: find_timing_paths

real	2m26,012s
user	2m21,920s
sys	0m4,081s

oharboe avatar Apr 19 '25 07:04 oharboe

This a the grt::have_routes perf top:

  31,31%  openroad                                 [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tup
   7,65%  openroad                                 [.] std::_Hashtable<std::tuple<int, int, int>, std::pair<std::tuple<int, i
   3,75%  openroad                                 [.] odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::
   2,17%  [kernel]                                 [k] 0xffffffff9c280f0a
   2,09%  openroad                                 [.] odb::_dbBox::getTechLayer() const
   1,75%  [kernel]                                 [k] 0xffffffff9cc37b52
   1,71%  openroad                                 [.] boost::icl::interval_base_set<boost::icl::interval_set<int, std::less,
   1,70%  libc.so.6                                [.] _int_malloc
   1,06%  libc.so.6                                [.] __strcmp_avx2
   0,95%  openroad                                 [.] grt::Grid::getBlockedTiles(odb::Rect const&, odb::Rect&, odb::Rect&, o
   0,92%  [kernel]                                 [k] 0xffffffff9cc38768
   0,91%  openroad                                 [.] odb::_dbBlock::getTech()
   0,62%  openroad                                 [.] odb::dbHashTable<odb::_dbInst>::find(char const*)
   0,58%  libc.so.6                                [.] malloc
   0,57%  [kernel]                                 [k] 0xffffffff9cbb68e7
   0,55%  libtcl8.6.so                             [.] TclpFree
   0,51%  openroad                                 [.] odb::dbObject::getId() const
   0,50%  openroad                                 [.] grt::Grid::computeTileReduceInterval(odb::Rect const&, odb::Rect const
   0,47%  openroad                                 [.] odb::_dbLib::getTech()
   0,47%  openroad                                 [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned sh
   0,47%  [kernel]                                 [k] 0xffffffff9ce0128c

oharboe avatar Apr 19 '25 07:04 oharboe

Two find_timing_paths pref top snapshots:

Overhead  Shared Object                            Symbol
   8,35%  openroad                                 [.] odb::dbITerm::getMTerm() const
   5,90%  openroad                                 [.] odb::dbObject::getObjectType() const
   4,67%  openroad                                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&,
   3,85%  [kernel]                                 [k] 0xffffffff9c280f0a
   3,72%  openroad                                 [.] odb::dbObject::getId() const
   2,73%  openroad                                 [.] sta::Levelize::findToplologicalOrder()
   2,71%  [kernel]                                 [k] 0xffffffff9cc37b52
   2,37%  openroad                                 [.] odb::dbITerm::getInst() const
   2,12%  openroad                                 [.] sta::Table2::findValue(float, float, float) const
   1,93%  openroad                                 [.] sta::VertexOutEdgeIterator::next()
   1,91%  openroad                                 [.] sta::LibertyPort::capacitance(sta::RiseFall const*, sta::MinMax const*,
   1,65%  openroad                                 [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<gr
   1,56%  [kernel]                                 [k] 0xffffffff9cc38768
   1,30%  openroad                                 [.] sta::dbNetwork::id(sta::Pin const*) const
   1,27%  openroad                                 [.] sta::GraphDelayCalc::annotateLoadDelays(sta::Vertex*, sta::RiseFall cons
   1,19%  openroad                                 [.] sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary
   1,13%  openroad                                 [.] sta::findValueIndex(float, sta::Vector<float> const*)
   1,08%  openroad                                 [.] sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet
Overhead  Shared Object                            Symbol
   4,61%  openroad                                 [.] odb::dbObject::getObjectType() const
   4,34%  [kernel]                                 [k] 0xffffffff9c280f0a
   3,77%  openroad                                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&,
   3,75%  openroad                                 [.] odb::dbObject::getId() const
   3,17%  openroad                                 [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
   2,84%  [kernel]                                 [k] 0xffffffff9cc37b52
   2,62%  openroad                                 [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::Rise
   2,58%  openroad                                 [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*,
   2,49%  openroad                                 [.] sta::dbNetwork::id(sta::Pin const*) const
   1,99%  openroad                                 [.] odb::dbInst::getMaster() const
   1,98%  openroad                                 [.] odb::dbITerm::getMTerm() const
   1,81%  [kernel]                                 [k] 0xffffffff9cc38768
   1,73%  openroad                                 [.] std::__detail::_Map_base<sta::Tag*, std::pair<sta::Tag* const, unsigned
   1,52%  openroad                                 [.] odb::dbITerm::getInst() const
   1,33%  openroad                                 [.] sta::ArrivalVisitor::visitFromToPath(sta::Pin const*, sta::Vertex*, sta:
   1,33%  openroad                                 [.] sta::Debug::check(char const*, int) const
   1,29%  openroad                                 [.] sta::PathVisitor::visitEdge(sta::Pin const*, sta::Vertex*, sta::Edge*, s
   1,18%  openroad                                 [.] sta::VertexPathIterator::next()
   1,14%  openroad                                 [.] sta::TagGroupBldr::copyPaths(sta::TagGroup*, sta::Path*)
   0,97%  openroad                                 [.] sta::TagGroupEqual::operator()(sta::TagGroup const*, sta::TagGroup const

oharboe avatar Apr 19 '25 07:04 oharboe

Is there something here that still requires action. I'm not sure what to make of all these stacks.

maliberty avatar Apr 19 '25 16:04 maliberty

Is there something here that still requires action. I'm not sure what to make of all these stacks.

I think the test case is evidence that with moderate effort this could be made significantly faster.

However, unless something leaps out as actionable then it means that this "merits further study" but that does not mean that it should be a priority.

I would close pending a decision that this should take priority

oharboe avatar Apr 19 '25 16:04 oharboe

The only one that stands out is from earlier: grt::GlobalRouter::computeObstructionsAdjustments() (@grt::GlobalRouter::computeObstructionsAdjustments():47) grt::GlobalRouter::applyAdjustments(int, int) (@grt::GlobalRouter::applyAdjustments(int, int):28) grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():129) grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16) grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)

where we could skip some of this work but I'm not sure it takes much time.

maliberty avatar Apr 19 '25 17:04 maliberty

A bit of spelunking in odb::dbObject::getObjectType() const reveals lots of pointer chasing. I think what is needed is a deep think about the datastructures, access patterns and an epihany about how to reorganize datastructures. Doesn't sound like it will be quick or easy or that it is the way it is today for lack of effort earlier...

oharboe avatar Apr 19 '25 18:04 oharboe

I did a pref report.

find_timing_paths, which takes most of the time, is spending all its time navigating data structures, it seems.

getMTerm() is a particularly nasty piece of work, it is pointer chasing hell.

https://github.com/The-OpenROAD-Project/OpenROAD/blob/3c8674f9a610d38db25105cafa8ef6a2f9184802/src/odb/src/db/dbITerm.cpp#L154-L166

Overhead  Command   Shared Object            Symbol
   6,30%  openroad  openroad                 [.] odb::dbObject::getId() const
   4,67%  openroad  openroad                 [.] odb::dbITerm::getMTerm() const
   4,58%  openroad  openroad                 [.] odb::dbObject::getObjectType() const
   4,27%  openroad  openroad                 [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModI
   2,53%  openroad  openroad                 [.] sta::dbNetwork::id(sta::Pin const*) const
   1,76%  openroad  openroad                 [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tuple<int, int, i
   1,60%  openroad  openroad                 [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
   1,59%  openroad  openroad                 [.] sta::Levelize::findToplologicalOrder()
   1,54%  openroad  openroad                 [.] odb::dbITerm::getInst() const
   1,50%  openroad  openroad                 [.] grt::Net::getName[abi:cxx11]() const
   1,40%  openroad  openroad                 [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<grt::Pin> > co
   1,39%  openroad  openroad                 [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*,
   1,35%  openroad  openroad                 [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::
   1,33%  openroad  libc.so.6                [.] malloc
   1,19%  openroad  libc.so.6                [.] _int_free
   1,19%  openroad  openroad                 [.] odb::dbInst::getMaster() const
   1,18%  openroad  openroad                 [.] sta::VertexOutEdgeIterator::next()
   0,90%  openroad  libc.so.6                [.] malloc_consolidate
   0,87%  openroad  libc.so.6                [.] cfree@GLIBC_2.2.5
   0,84%  openroad  libstdc++.so.6.0.33      [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)

oharboe avatar Apr 20 '25 06:04 oharboe

The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.

QuantamHD avatar Apr 20 '25 07:04 QuantamHD

The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.

Sounds like the problem is well understood. This makes me optimistic. Naively I would hope something keeping two maps: one for iteration on members and one for lookup, could do the trick.

Pointer to code?

oharboe avatar Apr 20 '25 08:04 oharboe

The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.

@maliberty @QuantamHD

If sta::Set() and sta::Map() were rewritten to maintain a KEY to ID map in addition to a KEY set, instead of taking a std::less<KEY> operator, I speculate that it could result in a significant speedup.

However, there may be more oportunities in the code that is missed: ordering may not matter very often?

In that case, it would be better to explicitly sort as needed in the user code as the set implementation can't know if ordering is needed or not. Sorting should probably create a KEY to ID map as a vector, sort on ID, rather than take a comparison operator, assuming comparison operator is defined for ID.

On the other hand, this opens up for a protracted scenario of hard to catch bugs. Perhaps better to introduce lazy sorting and an explicit unordered iteration interface?

That way the few (?) cases where ordering doesn't matter and performance is important could be fixed and the rest could use a more CPU intensive conservative assumptions.

Anyway... some deep understanding of the use-cases and profiling seems to be needed here.

https://github.com/parallaxsw/OpenSTA/blob/400c473fe384773a4788ee8378238462b4291fe3/include/sta/Set.hh#L32-L33

https://github.com/parallaxsw/OpenSTA/blob/400c473fe384773a4788ee8378238462b4291fe3/include/sta/Map.hh#L33-L34

oharboe avatar Apr 21 '25 09:04 oharboe