LightGraphs.jl icon indicating copy to clipboard operation
LightGraphs.jl copied to clipboard

Reduced time & memory footprint for Tarjans algorithm, fixed a bug where it was O(E^2) on star graphs.

Open saolof opened this issue 4 years ago • 12 comments

I worked on improving the performance of the strongly_connected_components function, and on fixing #1560 .

This PR eliminates several of the arrays that were used for auxiliary data using various tricks that were directly inspired by David J. Pearce's preprint over at https://homepages.ecs.vuw.ac.nz/~djp/files/IPL15-preprint.pdf , which reduces memory usage & allocations and improves cache efficiency. In particular, this lead to a significant speedup for dense random graphs in benchmarking. I would also subjectively claim that this version of the algorithm is more easily readable than Tarjan's original version.

In addition to this, it adds a new stack to keep track of the iteration state for nodes with a large number of outneighbours, specifically to fix #1560 . Interestingly, benchmarking revealed that the size threshold above which saving the iteration state is worth doing is surprisingly large (on the order of a thousand outneighbours), though of course that still applies to many real-world graphs.

I also put up a gist that anyone can use for some quick benchmarks: https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da

Furthermore, since it also computes a component table at the same time, adding a flag to return the component table would allow saving some work in other functions that compute one for the SCCs such as https://github.com/JuliaGraphs/LightGraphs.jl/blob/2a644c2b15b444e7f32f73021ec276aa9fc8ba30/src/connectivity.jl#L541 and places where that is called such as when computing transitive closures.

saolof avatar Apr 11 '21 23:04 saolof

Codecov Report

Merging #1559 (09683df) into master (5d2b80a) will decrease coverage by 0.12%. The diff coverage is 88.33%.

:exclamation: Current head 09683df differs from pull request most recent head ebb392b. Consider uploading reports for the commit ebb392b to get more accurate results

@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
- Coverage   99.44%   99.31%   -0.13%     
==========================================
  Files         106      106              
  Lines        5551     5568      +17     
==========================================
+ Hits         5520     5530      +10     
- Misses         31       38       +7     

codecov[bot] avatar Apr 12 '21 00:04 codecov[bot]

Hi, thanks for your contribution - It's been a few years since I last looked at Tarjans algorithm, so I will first have to figure out again how it works to do some complete code review. In the meantime, could you add the code you used for benchmarking and the results that you got as a comment to this PR?

simonschoelly avatar Apr 12 '21 12:04 simonschoelly

Okay, here's some benchmark results. The first one is the previous library function. The second is what I currently have in this PR. Third one saves the iteration state when vertices become large enough, at the cost of a few extra operations for small vertices. Should I commit the third version since it is still quite competitive but solves the Star graph problem?


testing function strongly_connected_components:
path_digraph(100)
Trial(11.700 μs)
random_regular_digraph(500,5)
Trial(35.200 μs)
 random_tournament_digraph(200)
Trial(88.000 μs)
 random_orientation_dag(complete_graph(100))
Trial(18.500 μs)
 star_digraph(10000)
Trial(31.387 ms)

testing function strongly_connected_components_downward:
path_digraph(100)
Trial(6.360 μs)
random_regular_digraph(500,5)
Trial(32.200 μs)
 random_tournament_digraph(200)
Trial(40.100 μs)
 random_orientation_dag(complete_graph(100))
Trial(14.500 μs)
 star_digraph(10000)
Trial(42.072 ms)

testing function strongly_connected_components_2:
path_digraph(100)
Trial(7.075 μs)
random_regular_digraph(500,5)
Trial(34.600 μs)
 random_tournament_digraph(200)
Trial(38.400 μs)
 random_orientation_dag(complete_graph(100))
Trial(15.600 μs)
 star_digraph(10000)
Trial(778.800 μs)


saolof avatar Apr 12 '21 16:04 saolof

Ah, I hadn't reenabled @inbounds in the third version. Here's some benchmarks with slightly bigger inputs and @inbounds enabled:

testing function strongly_connected_components:
path_digraph(1000)
Trial(110.500 μs)
random_regular_digraph(5000,8)
Trial(537.400 μs)
 random_tournament_digraph(1000)
Trial(2.216 ms)
 random_orientation_dag(complete_graph(1000))
Trial(918.600 μs)
 star_digraph(10000)
Trial(30.834 ms)

testing function strongly_connected_components_downward:
path_digraph(1000)
Trial(61.500 μs)
random_regular_digraph(5000,8)
Trial(517.800 μs)
 random_tournament_digraph(1000)
Trial(815.900 μs)
 random_orientation_dag(complete_graph(1000))
Trial(894.300 μs)
 star_digraph(10000)
Trial(33.027 ms)

testing function strongly_connected_components_2:
path_digraph(1000)
Trial(65.500 μs)
random_regular_digraph(5000,8)
Trial(532.900 μs)
 random_tournament_digraph(1000)
Trial(834.200 μs)
 random_orientation_dag(complete_graph(1000))
Trial(909.300 μs)
 star_digraph(10000)
Trial(756.500 μs)

saolof avatar Apr 12 '21 16:04 saolof

Okay, seems fixed. The only non-passing test in the last run is the diffusion simulation at https://github.com/JuliaGraphs/LightGraphs.jl/blob/master/test/traversals/diffusion.jl#L155 which seems to give false negatives occasionally according to comment.

saolof avatar Apr 13 '21 09:04 saolof

https://gist.github.com/saolof/7b5d26a41e6a34ff2b3e76d3fc5541da Updated Gist with benchmarks. Here's some quick results for the latest version in the current commit vs the once currently in master, after tuning the large node cutoff parameter to ~1024 after a first round of benchmarks:

testing function strongly_connected_components:
path_digraph(10000)
  4.615 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  33.385 ms (5983 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  38.742 ms (99 allocations: 4.20 MiB)
random_regular_digraph(50000,200)
  92.663 ms (60 allocations: 4.19 MiB)
random_regular_digraph(50000,2000)
  515.990 ms (60 allocations: 4.19 MiB)
 random_tournament_digraph(10000)
  232.991 ms (51 allocations: 1014.59 KiB)
 random_orientation_dag(complete_graph(10000))
  69.689 ms (20031 allocations: 1.71 MiB)
 star_digraph(100000)
  3.515 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.926 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  36.592 ms (3022 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  41.791 ms (77 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  78.061 ms (56 allocations: 3.43 MiB)
random_regular_digraph(50000,2000)
  207.933 ms (71 allocations: 5.43 MiB)
 random_tournament_digraph(10000)
  75.131 ms (61 allocations: 1.34 MiB)
 random_orientation_dag(complete_graph(10000))
  75.588 ms (10028 allocations: 1.25 MiB)
 star_digraph(100000)
  13.889 ms (100026 allocations: 12.01 MiB)

In general, the speedup seems to be particularly big whenever the weak connectivity of the digraph is large (i.e. depending on width vs height of the DFS forest).

saolof avatar Apr 14 '21 07:04 saolof

Some more benchmarks with more test cases (updated gist with them):

testing function strongly_connected_components:
path_digraph(10000)
  3.778 ms (20049 allocations: 2.21 MiB)
random_regular_digraph(50000,3)
  29.751 ms (6063 allocations: 4.12 MiB)
random_regular_digraph(50000,8)
  34.748 ms (84 allocations: 4.20 MiB)
random_regular_digraph(50000,200)     
  88.052 ms (60 allocations: 4.19 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  46.974 ms (100034 allocations: 8.30 MiB)
random_regular_digraph(50000,2000)
  516.017 ms (60 allocations: 4.19 MiB)
random_tournament_digraph(10000)
  239.401 ms (51 allocations: 1014.59 KiB)
random_orientation_dag(complete_graph(10000))
  69.948 ms (20031 allocations: 1.71 MiB)
star_digraph(100000)
  3.585 s (200029 allocations: 16.59 MiB)
 ------------------------------
testing function strongly_connected_components_2:
path_digraph(10000)
  2.274 ms (10033 allocations: 1.50 MiB)
random_regular_digraph(50000,3)
  32.453 ms (3062 allocations: 3.27 MiB)
random_regular_digraph(50000,8)
  37.415 ms (69 allocations: 3.43 MiB)
random_regular_digraph(50000,200)
  74.465 ms (56 allocations: 3.43 MiB)
random_orientation_dag(random_regular_graph(50000,200))
  41.914 ms (50027 allocations: 6.01 MiB)
random_regular_digraph(50000,2000)
  208.503 ms (71 allocations: 5.43 MiB)
random_tournament_digraph(10000)
  71.482 ms (61 allocations: 1.34 MiB)
random_orientation_dag(complete_graph(10000))
  71.993 ms (10028 allocations: 1.25 MiB)
star_digraph(100000)
  11.344 ms (100026 allocations: 12.01 MiB)

saolof avatar Apr 14 '21 11:04 saolof

I am very ok with this as long as it passes @simonschoelly 's muster. Thank you.

sbromberger avatar May 13 '21 11:05 sbromberger

I stopped reviewing this PR, as you where pushing quite a lot of commits, so I wanted to wait until you get that done. I will try to continue the review then this week.

simonschoelly avatar May 17 '21 22:05 simonschoelly

Hi all,

Thanks for the PR and for the comprehensive review! I'm tracking - let me know when it's ready for a final review and merge.

sbromberger avatar Jun 04 '21 11:06 sbromberger

Hi all,

Just wanted to ping on this. Is it ready for final review?

sbromberger avatar Jul 06 '21 12:07 sbromberger

Yes.

Any further additions I may want to suggest (possibly including making it stable by doing the output in the exterior loop instead of the interior one, or adding some flag to return the rootindex array) would be in a separate PR. This is just a performance improvement that strictly avoids making any user-visible changes other than improving performance in the expensive cases, while enabling later PR's by ensuring that the data structures used internally are also useful to return.

saolof avatar Jul 06 '21 13:07 saolof