Count number of connected components more efficiently than `length(connected_components(g))`
This adds a new function count_connected_components, which returns the same value as length(connected_components(g)) but substantially faster by avoiding unnecessary allocations. In particular, connected_components materializes component vectors that are not actually necessary for determining the number of components.
Similar reasoning also lets one optimize is_connected a bit: did that also.
While I was there, I also improved connected_components! slightly: previously, it was allocating a new queue for every new "starting vertex" in the search; but the queue is always empty when it's time to add a new vertex at that point, so there's no point in instantiating a new vector.
To enable users who might want to call connected_components! many times in a row to reduce allocations further (I am one such user), I also made it possible to pass this queue as an optimization.
Finally, connected_components! is very useful and would make sense to export; so I've done that here.
Cc @gdalle, if you have time to review.
For the doctest example of g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])), count_connected_components is about twice as fast as length∘connected_components (179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 97.31%. Comparing base (
24539fd) to head (d5e0e37). Report is 4 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #407 +/- ##
=======================================
Coverage 97.30% 97.31%
=======================================
Files 117 117
Lines 6948 6963 +15
=======================================
+ Hits 6761 6776 +15
Misses 187 187
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
For the doctest example of
g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])),count_connected_componentsis about twice as fast aslength∘connected_components(179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).
We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.
We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.
What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.
As context, in my situation the graphs are rarely larger than 50-100 vertices; my challenge is that I need to consider a huge number of permutations of such graphs, so performance in the small-graph case is relevant to me.
What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.
I have opened this issue to discuss further:
- #411