Speed up opening of large designs grt::have_routes and sta::Sta::findPathEnds()
Description
untar and run https://drive.google.com/file/d/1J59T6hj0Bc7tRdL7XFq5bjGwDrNfEjJ0/view?usp=sharing
$ time ./run-me-BoomTile-asap7-base.sh
real 2m57,217s
user 2m51,280s
sys 0m5,876s
Some perf top snapshots:
Overhead Shared Object Symbol
8,89% openroad [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tuple<int, int, int> const, boos
8,44% openroad [.] grt::GlobalRouter::adjustTransitionLayers(std::vector<int, std::allocator<int> > const&, std::map<i
6,67% openroad [.] odb::dbITerm::getMTerm() const
6,61% [kernel] [k] 0xffffffff9c038f80
3,81% openroad [.] grt::FastRouteCore::initBlockedIntervals(std::vector<int, std::allocator<int> >&)
3,16% openroad [.] grt::FastRouteCore::getEdgeCapacity(int, int, int, int, int)
2,54% libstdc++.so.6.0.33 [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
2,45% openroad [.] odb::dbObject::getObjectType() const
1,92% libc.so.6 [.] _int_malloc
1,88% openroad [.] odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape> >&)
1,83% openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
1,80% [kernel] [k] 0xffffffff9b6813aa
1,37% openroad [.] sta::Graph::makeEdge(sta::Vertex*, sta::Vertex*, sta::TimingArcSet*)
1,35% openroad [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned short, bool)
1,29% openroad [.] boost::icl::interval_base_set<boost::icl::interval_set<int, std::less, boost::icl::discrete_interva
1,21% openroad [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned short, bool) [clone .constprop.
1,13% openroad [.] grt::GlobalRouter::computeUserLayerAdjustments(int)
Overhead Shared Object Symbol
6,80% [kernel] [k] 0xffffffff9c038f80
5,60% openroad [.] odb::dbObject::getId() const
5,21% openroad [.] odb::dbITerm::getMTerm() const
5,17% openroad [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<grt::Pin> > const&, std::vect
3,92% openroad [.] odb::dbObject::getObjectType() const
2,88% openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
1,61% openroad [.] odb::dbITerm::getInst() const
1,58% openroad [.] odb::compare_by_id(odb::dbObject const*, odb::dbObject const*)
1,58% openroad [.] sta::Table2::findValue(float, float, float) const
1,48% openroad [.] std::map<grt::RoutePt, int, std::less<grt::RoutePt>, std::allocator<std::pair<grt::RoutePt const, i
1,28% openroad [.] sta::LibertyPort::capacitance(sta::RiseFall const*, sta::MinMax const*, sta::OperatingConditions co
1,25% openroad [.] sta::VertexOutEdgeIterator::next()
1,07% libstdc++.so.6.0.33 [.] std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_
1,05% libc.so.6 [.] malloc
Overhead Shared Object Symbol
5,19% [kernel] [k] 0xffffffff9c038f80
4,22% openroad [.] odb::dbObject::getObjectType() const
3,89% openroad [.] odb::dbObject::getId() const
3,83% openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModITerm*&, odb::db
3,62% openroad [.] sta::TagGroupBldr::tagMatchArrival(sta::Tag*, sta::Tag*&, float&, int&) const
3,19% openroad [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::ClkInfo*, sta::
2,76% openroad [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::PathVerte
2,67% openroad [.] sta::dbNetwork::id(sta::Pin const*) const
2,30% openroad [.] odb::dbInst::getMaster() const
1,94% openroad [.] odb::dbITerm::getMTerm() const
1,63% openroad [.] sta::Debug::check(char const*, int) const
Some stack traces, grt::have_routes takes quite a bit of time:
odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape>>&) (@odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::dbShape>>&):68)
grt::GlobalRouter::findNetsObstructions(odb::Rect&) (@grt::GlobalRouter::findNetsObstructions(odb::Rect&):189)
grt::GlobalRouter::computeObstructionsAdjustments() (@grt::GlobalRouter::computeObstructionsAdjustments():47)
grt::GlobalRouter::applyAdjustments(int, int) (@grt::GlobalRouter::applyAdjustments(int, int):28)
grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():129)
grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16)
grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)
_wrap_have_routes (@_wrap_have_routes:25)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)
There are some paths with many levels of logic since synthesis doesn't do retiming:
sta::Edge::isDisabledConstraint() const (@sta::Edge::isDisabledConstraint() const:16)
sta::SearchPredNonLatch2::searchThru(sta::Edge*) (@sta::SearchPredNonLatch2::searchThru(sta::Edge*):18)
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&):113)
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int,
sta::Vector<sta::Edge*>&):144)
[hundreds of recursions]
sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&) (@sta::Levelize::visit(sta::Vertex*, int, int, sta::Vector<sta::Edge*>&):144)
sta::Levelize::levelize() (@sta::Levelize::levelize():88)
sta::Sta::ensureClkNetwork() (@sta::Sta::ensureClkNetwork():17)
sta::dbSta::findClkNets() (@sta::dbSta::findClkNets():18)
grt::GlobalRouter::initClockNets() (@grt::GlobalRouter::initClockNets():77)
grt::GlobalRouter::findNets() (@grt::GlobalRouter::findNets():18)
grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():77)
grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16)
grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)
_wrap_have_routes (@_wrap_have_routes:25)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)
After grt::have_routes,
odb::dbITerm::getMTerm() const (@odb::dbITerm::getMTerm() const:49)
sta::dbNetwork::port(sta::Pin const*) const (@sta::dbNetwork::port(sta::Pin const*) const:21)
sta::Network::libertyPort(sta::Pin const*) const (@sta::Network::libertyPort(sta::Pin const*) const:8)
sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary const*, sta::RiseFall const*, float&, float&) (@sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary const*, sta::RiseFall const*, float&, float&):25)
sta::LumpedCapDelayCalc::makeResult(sta::LibertyLibrary const*, sta::RiseFall const*, float, float, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&) (@sta::LumpedCapDelayCalc::makeResult(sta::LibertyLibrary const*, sta::RiseFall const*, float, float, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&):47)
sta::LumpedCapDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*) (@sta::LumpedCapDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*):70)
sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*) (@sta::DmpCeffDelayCalc::gateDelay(sta::Pin const*, sta::TimingArc const*, float const&, float, sta::Parasitic const*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>> const&, sta::DcalcAnalysisPt const*):167)
sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::TimingArc const*, sta::DcalcAnalysisPt const*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):111)
sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, std::array<bool, 2ul>&) (@sta::GraphDelayCalc::findDriverEdgeDelays(sta::Vertex*, sta::MultiDrvrNet const*, sta::Edge*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&, std::array<bool, 2ul>&):60)
sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverDelays1(sta::Vertex*, sta::MultiDrvrNet*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):145)
sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&) (@sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*, std::map<sta::Pin const*, unsigned long, sta::PinIdLess, std::allocator<std::pair<sta::Pin const* const, unsigned long>>>&):42)
sta::GraphDelayCalc::findVertexDelay(sta::Vertex*, sta::ArcDelayCalc*, bool) (@sta::GraphDelayCalc::findVertexDelay(sta::Vertex*, sta::ArcDelayCalc*, bool):110)
sta::BfsIterator::visit(int, sta::VertexVisitor*) (@sta::BfsIterator::visit(int, sta::VertexVisitor*):72)
sta::GraphDelayCalc::findDelays(int) (@sta::GraphDelayCalc::findDelays(int):48)
sta::Sta::searchPreamble() (@sta::Sta::searchPreamble():27)
sta::Sta::findPathEnds(sta::ExceptionFrom*, sta::Vector<sta::ExceptionThru*>*, sta::ExceptionTo*, bool, sta::Corner const*, sta::MinMaxAll const*, int, int, bool, float, float, bool, sta::Set<char const*, sta::CharPtrLess>*, bool, bool, bool, bool, bool, bool) (@sta::Sta::findPathEnds(sta::ExceptionFrom*, sta::Vector<sta::ExceptionThru*>*, sta::ExceptionTo*, bool, sta::Corner const*, sta::MinMaxAll const*, int, int, bool, float, float, bool, sta::Set<char const*, sta::CharPtrLess>*, bool, bool, bool, bool, bool, bool):46)
_wrap_find_path_ends (@_wrap_find_path_ends:288)
TclNRRunCallbacks (@TclNRRunCallbacks:38)
___lldb_unnamed_symbol1507 (@___lldb_unnamed_symbol1507:348)
Tcl_EvalEx (@Tcl_EvalEx:11)
Tcl_Eval (@Tcl_Eval:12)
sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*) (@sta::sourceTclFile(char const*, bool, bool, Tcl_Interp*):32)
tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0) (@tclAppInit(int&, char**, char const*, Tcl_Interp*) (.constprop.0):303)
Tcl_MainEx (@Tcl_MainEx:113)
main (@main:729)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:44)
_start (@_start:15)
Suggested Solution
Look for some low hanging fruit to optimize by gdb suspend.
Additional Context
No response
There are two different problems rolled into this one issue. @eder-matheus to look at have_routes.
@maliberty FYI, before and after sta::Levelize() improvement, no change. I had this test case ready to go so I thought I would try it after sta::Levelize() improvement.
oyvind@corona:~/OpenROAD-flow-scripts/foo$ time ./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-20670-g4939290b5
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
read_db results/asap7/BoomTile/base/5_1_grt.odb
Took 8 seconds: read_db results/asap7/BoomTile/base/5_1_grt.odb
GUI_TIMING=1 reading timing, takes a little while for large designs...
read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
Took 10 seconds: read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
grt::have_routes
Took 48 seconds: grt::have_routes
No global routing results available, skipping estimate_parasitics
Load .//reports/asap7/BoomTile/base/congestion.rpt for details
find_timing_paths
Took 71 seconds: find_timing_paths
real 2m27,613s
user 2m23,664s
sys 0m3,944s
oyvind@corona:~/OpenROAD-flow-scripts/foo$ which openroad
/home/oyvind/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
oyvind@corona:~/OpenROAD-flow-scripts/foo$ time ./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-20763-g3a77ad118
Features included (+) or not (-): +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
read_db results/asap7/BoomTile/base/5_1_grt.odb
Took 9 seconds: read_db results/asap7/BoomTile/base/5_1_grt.odb
GUI_TIMING=1 reading timing, takes a little while for large designs...
read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
Took 9 seconds: read_sdc /home/oyvind/OpenROAD-flow-scripts/foo/results/asap7/BoomTile/base/5_1_grt.sdc
grt::have_routes
Took 47 seconds: grt::have_routes
No global routing results available, skipping estimate_parasitics
Load .//reports/asap7/BoomTile/base/congestion.rpt for details
find_timing_paths
Took 71 seconds: find_timing_paths
real 2m26,012s
user 2m21,920s
sys 0m4,081s
This a the grt::have_routes perf top:
31,31% openroad [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tup
7,65% openroad [.] std::_Hashtable<std::tuple<int, int, int>, std::pair<std::tuple<int, i
3,75% openroad [.] odb::dbBox::getViaBoxes(std::vector<odb::dbShape, std::allocator<odb::
2,17% [kernel] [k] 0xffffffff9c280f0a
2,09% openroad [.] odb::_dbBox::getTechLayer() const
1,75% [kernel] [k] 0xffffffff9cc37b52
1,71% openroad [.] boost::icl::interval_base_set<boost::icl::interval_set<int, std::less,
1,70% libc.so.6 [.] _int_malloc
1,06% libc.so.6 [.] __strcmp_avx2
0,95% openroad [.] grt::Grid::getBlockedTiles(odb::Rect const&, odb::Rect&, odb::Rect&, o
0,92% [kernel] [k] 0xffffffff9cc38768
0,91% openroad [.] odb::_dbBlock::getTech()
0,62% openroad [.] odb::dbHashTable<odb::_dbInst>::find(char const*)
0,58% libc.so.6 [.] malloc
0,57% [kernel] [k] 0xffffffff9cbb68e7
0,55% libtcl8.6.so [.] TclpFree
0,51% openroad [.] odb::dbObject::getId() const
0,50% openroad [.] grt::Grid::computeTileReduceInterval(odb::Rect const&, odb::Rect const
0,47% openroad [.] odb::_dbLib::getTech()
0,47% openroad [.] grt::FastRouteCore::addAdjustment(int, int, int, int, int, unsigned sh
0,47% [kernel] [k] 0xffffffff9ce0128c
Two find_timing_paths pref top snapshots:
Overhead Shared Object Symbol
8,35% openroad [.] odb::dbITerm::getMTerm() const
5,90% openroad [.] odb::dbObject::getObjectType() const
4,67% openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&,
3,85% [kernel] [k] 0xffffffff9c280f0a
3,72% openroad [.] odb::dbObject::getId() const
2,73% openroad [.] sta::Levelize::findToplologicalOrder()
2,71% [kernel] [k] 0xffffffff9cc37b52
2,37% openroad [.] odb::dbITerm::getInst() const
2,12% openroad [.] sta::Table2::findValue(float, float, float) const
1,93% openroad [.] sta::VertexOutEdgeIterator::next()
1,91% openroad [.] sta::LibertyPort::capacitance(sta::RiseFall const*, sta::MinMax const*,
1,65% openroad [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<gr
1,56% [kernel] [k] 0xffffffff9cc38768
1,30% openroad [.] sta::dbNetwork::id(sta::Pin const*) const
1,27% openroad [.] sta::GraphDelayCalc::annotateLoadDelays(sta::Vertex*, sta::RiseFall cons
1,19% openroad [.] sta::DelayCalcBase::thresholdAdjust(sta::Pin const*, sta::LibertyLibrary
1,13% openroad [.] sta::findValueIndex(float, sta::Vector<float> const*)
1,08% openroad [.] sta::GraphDelayCalc::findDriverArcDelays(sta::Vertex*, sta::MultiDrvrNet
Overhead Shared Object Symbol
4,61% openroad [.] odb::dbObject::getObjectType() const
4,34% [kernel] [k] 0xffffffff9c280f0a
3,77% openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&,
3,75% openroad [.] odb::dbObject::getId() const
3,17% openroad [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
2,84% [kernel] [k] 0xffffffff9cc37b52
2,62% openroad [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::Rise
2,58% openroad [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*,
2,49% openroad [.] sta::dbNetwork::id(sta::Pin const*) const
1,99% openroad [.] odb::dbInst::getMaster() const
1,98% openroad [.] odb::dbITerm::getMTerm() const
1,81% [kernel] [k] 0xffffffff9cc38768
1,73% openroad [.] std::__detail::_Map_base<sta::Tag*, std::pair<sta::Tag* const, unsigned
1,52% openroad [.] odb::dbITerm::getInst() const
1,33% openroad [.] sta::ArrivalVisitor::visitFromToPath(sta::Pin const*, sta::Vertex*, sta:
1,33% openroad [.] sta::Debug::check(char const*, int) const
1,29% openroad [.] sta::PathVisitor::visitEdge(sta::Pin const*, sta::Vertex*, sta::Edge*, s
1,18% openroad [.] sta::VertexPathIterator::next()
1,14% openroad [.] sta::TagGroupBldr::copyPaths(sta::TagGroup*, sta::Path*)
0,97% openroad [.] sta::TagGroupEqual::operator()(sta::TagGroup const*, sta::TagGroup const
Is there something here that still requires action. I'm not sure what to make of all these stacks.
Is there something here that still requires action. I'm not sure what to make of all these stacks.
I think the test case is evidence that with moderate effort this could be made significantly faster.
However, unless something leaps out as actionable then it means that this "merits further study" but that does not mean that it should be a priority.
I would close pending a decision that this should take priority
The only one that stands out is from earlier: grt::GlobalRouter::computeObstructionsAdjustments() (@grt::GlobalRouter::computeObstructionsAdjustments():47) grt::GlobalRouter::applyAdjustments(int, int) (@grt::GlobalRouter::applyAdjustments(int, int):28) grt::GlobalRouter::initGridAndNets() (@grt::GlobalRouter::initGridAndNets():129) grt::GlobalRouter::loadGuidesFromDB() (.part.0) (@grt::GlobalRouter::loadGuidesFromDB() (.part.0):16) grt::GlobalRouter::haveRoutes() (@grt::GlobalRouter::haveRoutes():26)
where we could skip some of this work but I'm not sure it takes much time.
A bit of spelunking in odb::dbObject::getObjectType() const reveals lots of pointer chasing. I think what is needed is a deep think about the datastructures, access patterns and an epihany about how to reorganize datastructures. Doesn't sound like it will be quick or easy or that it is the way it is today for lack of effort earlier...
I did a pref report.
find_timing_paths, which takes most of the time, is spending all its time navigating data structures, it seems.
getMTerm() is a particularly nasty piece of work, it is pointer chasing hell.
https://github.com/The-OpenROAD-Project/OpenROAD/blob/3c8674f9a610d38db25105cafa8ef6a2f9184802/src/odb/src/db/dbITerm.cpp#L154-L166
Overhead Command Shared Object Symbol
6,30% openroad openroad [.] odb::dbObject::getId() const
4,67% openroad openroad [.] odb::dbITerm::getMTerm() const
4,58% openroad openroad [.] odb::dbObject::getObjectType() const
4,27% openroad openroad [.] sta::dbNetwork::staToDb(sta::Pin const*, odb::dbITerm*&, odb::dbBTerm*&, odb::dbModI
2,53% openroad openroad [.] sta::dbNetwork::id(sta::Pin const*) const
1,76% openroad openroad [.] std::__detail::_Map_base<std::tuple<int, int, int>, std::pair<std::tuple<int, int, i
1,60% openroad openroad [.] sta::TagGroupBldr::tagMatchPath(sta::Tag*, sta::Path*&, unsigned long&)
1,59% openroad openroad [.] sta::Levelize::findToplologicalOrder()
1,54% openroad openroad [.] odb::dbITerm::getInst() const
1,50% openroad openroad [.] grt::Net::getName[abi:cxx11]() const
1,40% openroad openroad [.] grt::GlobalRouter::mergeSegments(std::vector<grt::Pin, std::allocator<grt::Pin> > co
1,39% openroad openroad [.] sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*,
1,35% openroad openroad [.] sta::Search::mutateTag(sta::Tag*, sta::Pin const*, sta::RiseFall const*, bool, sta::
1,33% openroad libc.so.6 [.] malloc
1,19% openroad libc.so.6 [.] _int_free
1,19% openroad openroad [.] odb::dbInst::getMaster() const
1,18% openroad openroad [.] sta::VertexOutEdgeIterator::next()
0,90% openroad libc.so.6 [.] malloc_consolidate
0,87% openroad libc.so.6 [.] cfree@GLIBC_2.2.5
0,84% openroad libstdc++.so.6.0.33 [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.
The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.
Sounds like the problem is well understood. This makes me optimistic. Naively I would hope something keeping two maps: one for iteration on members and one for lookup, could do the trick.
Pointer to code?
The root of the problem if you look inside OpenSTA's code is that in order to provide determinism STA uses ordered maps to keep track of verticies, and used getId() as the key so we're paying n*log(n) cost to getId.
@maliberty @QuantamHD
If sta::Set() and sta::Map() were rewritten to maintain a KEY to ID map in addition to a KEY set, instead of taking a std::less<KEY> operator, I speculate that it could result in a significant speedup.
However, there may be more oportunities in the code that is missed: ordering may not matter very often?
In that case, it would be better to explicitly sort as needed in the user code as the set implementation can't know if ordering is needed or not. Sorting should probably create a KEY to ID map as a vector, sort on ID, rather than take a comparison operator, assuming comparison operator is defined for ID.
On the other hand, this opens up for a protracted scenario of hard to catch bugs. Perhaps better to introduce lazy sorting and an explicit unordered iteration interface?
That way the few (?) cases where ordering doesn't matter and performance is important could be fixed and the rest could use a more CPU intensive conservative assumptions.
Anyway... some deep understanding of the use-cases and profiling seems to be needed here.
https://github.com/parallaxsw/OpenSTA/blob/400c473fe384773a4788ee8378238462b4291fe3/include/sta/Set.hh#L32-L33
https://github.com/parallaxsw/OpenSTA/blob/400c473fe384773a4788ee8378238462b4291fe3/include/sta/Map.hh#L33-L34