rustworkx Add Stoer Wagner min cut algorithm.

This commits adds a new function stoer_wagner_min_cut that computes a minimum cut for weighted undirected graphs.

Aug 31 '21 12:08 georgios-ts

Pull Request Test Coverage Report for Build 1660468577

180 of 203 (88.67%) changed or added relevant lines in 5 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.2%) to 98.337%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
retworkx-core/src/connectivity/min_cut.rs	66	67	98.51%
retworkx-core/src/disjoint_set.rs	75	86	87.21%
src/score.rs	9	20	45.0%
<!--	Total:	180	203

Totals
Change from base Build 1651339449:	-0.2%
Covered Lines:	11293
Relevant Lines:	11484

💛 - Coveralls

Sep 01 '21 00:09 coveralls

Heh, I started a review earlier this morning and the pending comment that I left before I had to step away was asking if we could make it generic for retworkx-core you were reading my mind :)

Oct 13 '21 14:10 mtreinish

@mtreinish Yeah, it was a bit of extra work and i was forced to readjust the core logic of the function, but eventually it worked out. Since retworkx-core PR is now merged, I'm planning to move the generic version there with some extra docs.

Oct 13 '21 14:10 georgios-ts

@mtreinish I'm not sure what's the performance difference of safe/unsafe code here, we can do some becnhmarks if you wish. Personally I'm fine using unsafe blocks here and avoid unnecessary bounds checks since we have safety guarantees if we implement this right internally, i.e the apis are actually safe. In any case, if you feel really strong against using unsafe, i can replace the unchecked methods.

Oct 15 '21 10:10 georgios-ts

I'm fine with that, I don't think this is critical for the 0.11.0 release nothing is blocked. It'd be nice to have (especially considering it was opened > 4 months ago. But let's defer it to early 0.12.0 so we can discuss your concerns then.

Jan 12 '22 20:01 mtreinish

Pull Request Test Coverage Report for Build 3178013230

126 of 138 (91.3%) changed or added relevant lines in 4 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.03%) to 96.999%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
rustworkx-core/src/connectivity/min_cut.rs	94	95	98.95%
src/score.rs	9	20	45.0%
<!--	Total:	126	138

Files with Coverage Reduction	New Missed Lines	%
src/shortest_path/all_pairs_bellman_ford.rs	2	98.87%
<!--	Total:	2

Totals
Change from base Build 3175701022:	-0.03%
Covered Lines:	13448
Relevant Lines:	13864

💛 - Coveralls

Sep 22 '22 14:09 coveralls

Just my curiosity, why are we going to use DisjointSet data structure (with a slight extension) for implementing Stoer Wagner min cut algorithm? What are the advantages over another approach, e.g. copying the original graph and contracting it with storing the corresponding original nodes in each (super) node?

Sep 27 '22 13:09 itoko

Performance. You avoid cloning the graph and it's faster (at least theoretically) to merge two super nodes using a DSU. But I haven't tested this (or I might have but it was a while back and can't remember). I'll benchmark the approach you suggested and if there isn't any significant performance differences I'll switch to it.

Sep 27 '22 18:09 georgios-ts

That sounds good to me. I'm very interested in the benchmark result. I like DisjointSet/UnionFind (and want to have it if necessary). I know it's very fast in general, and current implementation would be already fast. However, I'm not sure this is a case where UnionFind can do the best job. I'm thinking the min-cut algorithm needs union function but may not need find function, so another approach may be comparable. I agree it's worth benchmarking.

Sep 28 '22 01:09 itoko

I based b81ffa0 on NetworkX implementation [1] which doesn't use a DSU. From the limited benchmarks I did performance looks similar in most cases (especially if there're many nodes with low degree) with a single exception where DSU was significantly faster. I can't see a clear winner given the code complexity of DSU structure so I'll leave the reviewers to decide which one we pick.

For reference the following benchmark script was used with arbitrarily chosen graphs of medium sizes

import time

import rustworkx
import networkx


benchmarks = [
    ("Ring of Cliques - 10 cliques", networkx.ring_of_cliques(10, 100)),
    ("Ring of Cliques - 100 cliques", networkx.ring_of_cliques(100, 10)),
    ("Mycielski Graph", networkx.mycielski_graph(10)),
    ("Margulis-Gabber-Galil graph", networkx.margulis_gabber_galil_graph(50)),
    ("Hexagonal graph", rustworkx.generators.hexagonal_lattice_graph(40, 40)),
    (
        "G_nm graph - 1_000 nodes",
        rustworkx.undirected_gnm_random_graph(1_000, 5_000, seed=2022),
    ),
    (
        "G_nm graph - 5_000 nodes",
        rustworkx.undirected_gnm_random_graph(5_000, 25_000, seed=2022),
    ),
]

for graph_name, graph in benchmarks:
    if isinstance(graph, networkx.Graph):
        graph = rustworkx.networkx_converter(graph)

    start = time.time()
    res = rustworkx.stoer_wagner_min_cut(graph)
    stop = time.time()

    run_time = stop - start

    print(
        f"{graph_name}: \n {run_time:.3} sec"
    )

resulting in

Graphs	c83ad1f	b81ffa0
Ring of Cliques - 10 cliques	1.33 sec	2.52 sec
Ring of Cliques - 100 cliques	0.22 sec	0.15 sec
Mycielski graph	0.96 sec	6.59 sec
Margulis-Gabber-Galil graph	2.59 sec	2.36 sec
Hexagonal graph	2.12 sec	1.39 sec
G_nm graph - 1_000 nodes	0.44 sec	0.42 sec
G_nm graph - 5_000 nodes	15.6 sec	17.8 sec

Sep 28 '22 16:09 georgios-ts

Thank you for the quick benchmarking, @georgios-ts. I run the above code in my local environment and got the similar results. I was curious about the surge in the Mycielski Graph case of b81ffa0 (networkx-like). I investigated it a bit and found out that is caused by adding parallel edges in the contraction of triangle subgraph structure. I examined the following changes to b81ffa0

@@ -165,7 +165,13 @@ where
                 .map(|edge| (s, edge.target(), *edge.weight()))
                 .collect::<Vec<_>>();
             for (source, target, cost) in edges {
-                graph_with_super_nodes.add_edge(source, target, cost);
+                if let Some(edge_index) = graph_with_super_nodes.find_edge(source, target) {
+                    if let Some(edge_weight) = graph_with_super_nodes.edge_weight_mut(edge_index) {
+                        *edge_weight += cost;
+                    }
+                } else {
+                    graph_with_super_nodes.add_edge(source, target, cost);
+                }

and observed performance improvement as shown below.

Networkx-like implementation: (b81ffa0 + alpha)
Ring of Cliques - 10 cliques: 	 0.734 sec
Ring of Cliques - 100 cliques: 	 0.128 sec
Mycielski Graph: 		 0.512 sec
Margulis-Gabber-Galil graph: 	 1.49 sec
Hexagonal graph: 		 1.06 sec
G_nm graph - 1_000 nodes: 	 0.261 sec
G_nm graph - 5_000 nodes: 	 8.87 sec

DisjointSet implementation (c83ad1f):
Ring of Cliques - 10 cliques: 	 1.22 sec
Ring of Cliques - 100 cliques: 	 0.195 sec
Mycielski Graph: 		 0.814 sec
Margulis-Gabber-Galil graph: 	 2.27 sec
Hexagonal graph: 		 1.8 sec
G_nm graph - 1_000 nodes: 	 0.398 sec
G_nm graph - 5_000 nodes: 	 13.2 sec

I think now we can pick the networkx-like implementation with some confidence. What do you think?

Sep 29 '22 17:09 itoko

Thanks for going deep into this. Good point! Parallel edges increase the number of updates in the priority queue and hence the slowdown. I applied your suggestion in 98c5f84.

Sep 29 '22 20:09 georgios-ts

rustworkx rustworkx copied to clipboard

Add Stoer Wagner min cut algorithm.

Pull Request Test Coverage Report for Build 1660468577

💛 - Coveralls

Pull Request Test Coverage Report for Build 3178013230

💛 - Coveralls

rustworkx
rustworkx copied to clipboard