Generate all connected subgraphs of size k
This PR addresses #1068
Here, a method for generating all connected subgraphs of a certain size k is developed based on the work in "Enumerating Connected Induced Subgraphs: Improved Delay and Experimental Comparison". Christian Komusiewicz and Frank Sommer. Compared to the brute-force approach of drawing all combinations of k=3..6 nodes and then checking whether this combination is a connected subgraph, the method demonstrates a runtime speedup of two orders of magnitude as seen in the following figure. The evaluated graph is a heavy-hex graph of distance d=1, 3, 5, ..., 99.
Raw data: raw_data.txt
Pull Request Test Coverage Report for Build 8126957629
Details
- 63 of 66 (95.45%) changed or added relevant lines in 3 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.03%) to 96.513%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| src/connectivity/mod.rs | 7 | 10 | 70.0% |
| <!-- | Total: | 63 | 66 |
| Totals | |
|---|---|
| Change from base Build 8117144686: | 0.03% |
| Covered Lines: | 16883 |
| Relevant Lines: | 17493 |
💛 - Coveralls
Awesome! Looking forward to review this
Awesome! Looking forward to review this
Thanks for reviewing this - this is my first rustworkx and also rust contribution so let me know if you have general remarks! :-)
@sbrandhsn I have thought more about the iterative version and I think it is achievable. Here is the pseudocode for it in Python:
def simple_iterative(p, x):
ans = []
# Third value indicates is_leaf
# None -> not processed
# False -> child returned false
# True -> child returned tru
stack = [(p, x, None)] # third value is is_leaf
while len(stack) > 0:
p, xnext, is_leaf = stack[-1]
# Handle P is equal to K case
if len(p) == k:
stack[-2][2] = True # update is_leaf
ans.push(p)
stack.pop()
continue
if is_leaf is not None and not is_leaf:
if len(stack) > 1:
stack[-2][2] = is_leaf or stack[-2][2]
stack.pop()
continue
# Handle X is empty case
if len(xnext) == 0:
if len(stack) > 1:
stack[-2][2] = is_leaf or stack[-2][2]
stack.pop()
continue
u = any_vertex(xnext)
xnext.remove(u)
xprime = union(xnext, difference(N(u), N(p)))
stack.push((union(p, [u]), xprime))
return ans
I hope that helps, I know the original paper only gave a recursive version
Thanks for your comments! :-) I wasn't aware of while let and will change the code accordingly. Regarding recursive vs iterative function. I can see your point in general. However, in this case the stack depth is bound by k and the hash set x_next is expected to be allocated on the heap, right?
For the five 64 bit arguments and a stack size of 1 MB, the stack depth (and k) can be up to 25000 before we run into a stack overflow... For such large k only trivial subgraphs can be expected to be returned in a reasonable time. Do you think it makes sense to emit a warning in cases where connected_subgraphs would have an excessive runtime, i.e. for large k or graphs with a high degree?
Lemma 4 in https://www.uni-marburg.de/de/fb12/arbeitsgruppen/algorith/paper/enucon.pdf actually introduces an iterative version. I will look into that version to see if I can implement it in reasonable time (as it improves the runtime asymptotically) and otherwise use your pseudo code. :-)
For the five 64 bit arguments and a stack size of 1 MB, the stack depth (and
k) can be up to 25000 before we run into a stack overflow... For such largekonly trivial subgraphs can be expected to be returned in a reasonable time. Do you think it makes sense to emit a warning in cases whereconnected_subgraphswould have an excessive runtime, i.e. for largekor graphs with a high degree?
I don't think we need to emit a warning. At first before rewriting the algorithm wasn't obvious that the bound in the recursion depth is the number of nodes of the graph. If the bound was the number of subgraphs though reaching 25000 is trivial so I had to ask.
I think for the first version the recursive version is ok. The iterative version would be slightly faster and more consistent with most of the code we had. But the final version will hopefully something that we can parallelize (and it does not need to come in this version!)