rustworkx
rustworkx copied to clipboard
Add Stoer Wagner min cut algorithm.
This commits adds a new function stoer_wagner_min_cut
that computes a minimum cut for weighted undirected graphs.
Pull Request Test Coverage Report for Build 1660468577
- 180 of 203 (88.67%) changed or added relevant lines in 5 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage decreased (-0.2%) to 98.337%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
retworkx-core/src/connectivity/min_cut.rs | 66 | 67 | 98.51% |
retworkx-core/src/disjoint_set.rs | 75 | 86 | 87.21% |
src/score.rs | 9 | 20 | 45.0% |
<!-- | Total: | 180 | 203 |
Totals | |
---|---|
Change from base Build 1651339449: | -0.2% |
Covered Lines: | 11293 |
Relevant Lines: | 11484 |
💛 - Coveralls
Heh, I started a review earlier this morning and the pending comment that I left before I had to step away was asking if we could make it generic for retworkx-core
you were reading my mind :)
@mtreinish Yeah, it was a bit of extra work and i was forced to readjust the core logic of the function, but eventually it worked out. Since retworkx-core
PR is now merged, I'm planning to move the generic version there with some extra docs.
@mtreinish I'm not sure what's the performance difference of safe/unsafe code here, we can do some becnhmarks if you wish. Personally I'm fine using unsafe blocks here and avoid unnecessary bounds checks since we have safety guarantees if we implement this right internally, i.e the apis are actually safe. In any case, if you feel really strong against using unsafe, i can replace the unchecked methods.
I'm fine with that, I don't think this is critical for the 0.11.0 release nothing is blocked. It'd be nice to have (especially considering it was opened > 4 months ago. But let's defer it to early 0.12.0 so we can discuss your concerns then.
Pull Request Test Coverage Report for Build 3178013230
- 126 of 138 (91.3%) changed or added relevant lines in 4 files are covered.
- 2 unchanged lines in 1 file lost coverage.
- Overall coverage decreased (-0.03%) to 96.999%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
rustworkx-core/src/connectivity/min_cut.rs | 94 | 95 | 98.95% |
src/score.rs | 9 | 20 | 45.0% |
<!-- | Total: | 126 | 138 |
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
src/shortest_path/all_pairs_bellman_ford.rs | 2 | 98.87% |
<!-- | Total: | 2 |
Totals | |
---|---|
Change from base Build 3175701022: | -0.03% |
Covered Lines: | 13448 |
Relevant Lines: | 13864 |
💛 - Coveralls
Just my curiosity, why are we going to use DisjointSet data structure (with a slight extension) for implementing Stoer Wagner min cut algorithm? What are the advantages over another approach, e.g. copying the original graph and contracting it with storing the corresponding original nodes in each (super) node?
Performance. You avoid cloning the graph and it's faster (at least theoretically) to merge two super nodes using a DSU. But I haven't tested this (or I might have but it was a while back and can't remember). I'll benchmark the approach you suggested and if there isn't any significant performance differences I'll switch to it.
That sounds good to me. I'm very interested in the benchmark result. I like DisjointSet/UnionFind (and want to have it if necessary). I know it's very fast in general, and current implementation would be already fast. However, I'm not sure this is a case where UnionFind can do the best job. I'm thinking the min-cut algorithm needs union
function but may not need find
function, so another approach may be comparable. I agree it's worth benchmarking.
I based b81ffa0 on NetworkX implementation [1] which doesn't use a DSU. From the limited benchmarks I did performance looks similar in most cases (especially if there're many nodes with low degree) with a single exception where DSU was significantly faster. I can't see a clear winner given the code complexity of DSU structure so I'll leave the reviewers to decide which one we pick.
For reference the following benchmark script was used with arbitrarily chosen graphs of medium sizes
import time
import rustworkx
import networkx
benchmarks = [
("Ring of Cliques - 10 cliques", networkx.ring_of_cliques(10, 100)),
("Ring of Cliques - 100 cliques", networkx.ring_of_cliques(100, 10)),
("Mycielski Graph", networkx.mycielski_graph(10)),
("Margulis-Gabber-Galil graph", networkx.margulis_gabber_galil_graph(50)),
("Hexagonal graph", rustworkx.generators.hexagonal_lattice_graph(40, 40)),
(
"G_nm graph - 1_000 nodes",
rustworkx.undirected_gnm_random_graph(1_000, 5_000, seed=2022),
),
(
"G_nm graph - 5_000 nodes",
rustworkx.undirected_gnm_random_graph(5_000, 25_000, seed=2022),
),
]
for graph_name, graph in benchmarks:
if isinstance(graph, networkx.Graph):
graph = rustworkx.networkx_converter(graph)
start = time.time()
res = rustworkx.stoer_wagner_min_cut(graph)
stop = time.time()
run_time = stop - start
print(
f"{graph_name}: \n {run_time:.3} sec"
)
resulting in
Graphs | c83ad1f | b81ffa0 |
---|---|---|
Ring of Cliques - 10 cliques | 1.33 sec | 2.52 sec |
Ring of Cliques - 100 cliques | 0.22 sec | 0.15 sec |
Mycielski graph | 0.96 sec | 6.59 sec |
Margulis-Gabber-Galil graph | 2.59 sec | 2.36 sec |
Hexagonal graph | 2.12 sec | 1.39 sec |
G_nm graph - 1_000 nodes | 0.44 sec | 0.42 sec |
G_nm graph - 5_000 nodes | 15.6 sec | 17.8 sec |
Thank you for the quick benchmarking, @georgios-ts. I run the above code in my local environment and got the similar results. I was curious about the surge in the Mycielski Graph case of b81ffa0 (networkx-like). I investigated it a bit and found out that is caused by adding parallel edges in the contraction of triangle subgraph structure. I examined the following changes to b81ffa0
@@ -165,7 +165,13 @@ where
.map(|edge| (s, edge.target(), *edge.weight()))
.collect::<Vec<_>>();
for (source, target, cost) in edges {
- graph_with_super_nodes.add_edge(source, target, cost);
+ if let Some(edge_index) = graph_with_super_nodes.find_edge(source, target) {
+ if let Some(edge_weight) = graph_with_super_nodes.edge_weight_mut(edge_index) {
+ *edge_weight += cost;
+ }
+ } else {
+ graph_with_super_nodes.add_edge(source, target, cost);
+ }
and observed performance improvement as shown below.
Networkx-like implementation: (b81ffa0 + alpha)
Ring of Cliques - 10 cliques: 0.734 sec
Ring of Cliques - 100 cliques: 0.128 sec
Mycielski Graph: 0.512 sec
Margulis-Gabber-Galil graph: 1.49 sec
Hexagonal graph: 1.06 sec
G_nm graph - 1_000 nodes: 0.261 sec
G_nm graph - 5_000 nodes: 8.87 sec
DisjointSet implementation (c83ad1f):
Ring of Cliques - 10 cliques: 1.22 sec
Ring of Cliques - 100 cliques: 0.195 sec
Mycielski Graph: 0.814 sec
Margulis-Gabber-Galil graph: 2.27 sec
Hexagonal graph: 1.8 sec
G_nm graph - 1_000 nodes: 0.398 sec
G_nm graph - 5_000 nodes: 13.2 sec
I think now we can pick the networkx-like implementation with some confidence. What do you think?
Thanks for going deep into this. Good point! Parallel edges increase the number of updates in the priority queue and hence the slowdown. I applied your suggestion in 98c5f84.