hootenanny ConflictsNetworkMatcher::_seedEdgeScores() is a major bottleneck for the Network Roads alg against large datasets

ConflictsNetworkMatcher::_seedEdgeScores() is a major bottleneck for the Network Roads alg against large datasets

Open bwitham opened this issue 5 years ago • 9 comments

See network-tests.child/gaalkacyo.child/gaalkacyo_large.child. I'm sure other large road datasets reveal the bottleneck as well. The previous Network alg bottleneck was in the removal of duplicate edges. That has since been handled and now this seems to be the culprit.

See if anything can be done to speed things up. If the alg can't be improved, then maybe it can be paralleized.

Aug 15 '19 18:08 bwitham

see #3534 and #3530

Oct 14 '19 14:10 bwitham

Short of reworking the algorithm itself, which is unlikely at this time, I can only think of looking into caching and maybe some basic efficiency checks for this. Maybe there's an opportunity for parallelization?...

Oct 14 '19 18:10 bwitham

Not really seeing much that can be optimized here. EdgeMatchSetFinder::_addEdgeMatches specifically is where most of the time is spent. Since its a recursive algorithm not seeing much that can be optimized. No one part of it is very slow, but the iterations of it baloon for large datasets.

Oct 14 '19 20:10 bwitham

Found this setting: network.edge.match.set.finder.max.iterations, which controls the recursion in EdgeMatchSetFinder::_addEdgeMatches. It doesn't look like an optimal value for it has ever been found with optimize-network-conf, so I'll do that and see if we can lower the default from 20. Also, will make it configurable from the UI, so it can be adjusted for larger datasets if desired.

Oct 14 '19 20:10 bwitham

Dropping network.edge.match.set.finder.max.iterations drastically from 20 to 1, reduces the runtime for gaal large by ~300%. This of course, says nothing for what effect it has on the conflation output quality...so will determine the best value.

Oct 14 '19 21:10 bwitham

Unfortunately, reducing network.edge.match.set.finder.max.iterations isn't possible. One test, pap-008, requires it to be at its current value of 20 to conflate the input data properly. All other tests will pass with it as low as 11. Running at 11, lowers gaal large conflate runtime by ~17%...not a ton but would be helpful. Regardless, lowering it isn't an option.

Going to look at parallelization opportunities...

Oct 15 '19 14:10 bwitham

Parallelized it but am getting beat up by OsmSchema Singleton access. ~~Its buried so deep in the code, not sure what I can do about it.~~ Have an idea how to fix this...

Oct 15 '19 21:10 bwitham

I tried using std::call_once to limit creation of OsmSchema. I think that's the right thing to do, but it doesn't matter b/c there's enough other thread unsafe code in OsmSchema and elsewhere that I'm throwing the towel on parallelization here. Will leave the issue open in case we think of anything else in the future.

Oct 16 '19 14:10 bwitham

The data from #3923 is also very slow to conflate with Network.

Mar 23 '20 21:03 bwitham

hootenanny hootenanny copied to clipboard

ConflictsNetworkMatcher::_seedEdgeScores() is a major bottleneck for the Network Roads alg against large datasets

hootenanny
hootenanny copied to clipboard