Graphs.jl icon indicating copy to clipboard operation
Graphs.jl copied to clipboard

Count number of connected components more efficiently than `length(connected_components(g))`

Open thchr opened this issue 1 year ago • 5 comments

This adds a new function count_connected_components, which returns the same value as length(connected_components(g)) but substantially faster by avoiding unnecessary allocations. In particular, connected_components materializes component vectors that are not actually necessary for determining the number of components. Similar reasoning also lets one optimize is_connected a bit: did that also.

While I was there, I also improved connected_components! slightly: previously, it was allocating a new queue for every new "starting vertex" in the search; but the queue is always empty when it's time to add a new vertex at that point, so there's no point in instantiating a new vector. To enable users who might want to call connected_components! many times in a row to reduce allocations further (I am one such user), I also made it possible to pass this queue as an optimization.

Finally, connected_components! is very useful and would make sense to export; so I've done that here.

Cc @gdalle, if you have time to review.

thchr avatar Nov 14 '24 11:11 thchr

For the doctest example of g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])), count_connected_components is about twice as fast as length∘connected_components (179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).

thchr avatar Nov 14 '24 11:11 thchr

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.31%. Comparing base (24539fd) to head (d5e0e37). Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #407   +/-   ##
=======================================
  Coverage   97.30%   97.31%           
=======================================
  Files         117      117           
  Lines        6948     6963   +15     
=======================================
+ Hits         6761     6776   +15     
  Misses        187      187           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 14 '24 12:11 codecov[bot]

For the doctest example of g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])), count_connected_components is about twice as fast as length∘connected_components (179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).

We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.

simonschoelly avatar Nov 20 '24 13:11 simonschoelly

We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.

What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.

As context, in my situation the graphs are rarely larger than 50-100 vertices; my challenge is that I need to consider a huge number of permutations of such graphs, so performance in the small-graph case is relevant to me.

thchr avatar Nov 21 '24 08:11 thchr

What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.

I have opened this issue to discuss further:

  • #411

gdalle avatar Nov 21 '24 11:11 gdalle