networkx icon indicating copy to clipboard operation
networkx copied to clipboard

simple_cycles infinite loop

Open steven-nicolaou opened this issue 3 years ago • 14 comments

Current Behavior

simple_cycles appears to enter an infinite loop. My graph has under 4000 nodes and edges. If I subtract a handful of them (less than 100), it runs successfully in just about a minute. I could not reduce it to a minimal example, so I have hashed the nodes and attached the smallest example that I could reproduce it with. It does not seem to be just a matter of a large number of cycles.

Expected Behavior

simple_cycles returns a list of cycles

Steps to Reproduce

g1: Graph that triggers infinite loop g2: Graph with some edges removed that runs successfully

# Load Graph 1
g1 = nx.adjacency_graph(json.load(open('g1.json')))
cycles = nx.simple_cycles(g1) # ← Infinite loop

# Load Graph 2
g2 = nx.adjacency_graph(json.load(open('g2.json')))
cycles = nx.simple_cycles(g2) # ← Runs in about 1 minute

Environment

Python version: 3.9.12 NetworkX version: 2.8.7

Additional context

The 2 graphs in JSON format graphs.zip

steven-nicolaou avatar Oct 13 '22 04:10 steven-nicolaou

Hi, I could replicate your problem. How did you select which nodes to subtract?

paulitapb avatar Oct 15 '22 21:10 paulitapb

Just trial and error, but it seems to happen with arbitrary combinations of nodes, so I don't think the problem is with individual nodes per se. This is just as minimal as I could get it.

stevennic avatar Oct 15 '22 21:10 stevennic

ok! I was looking at the algorithm cover test and there's one if-condition branch that was not tested. I will see if that's where the problem occurs.

paulitapb avatar Oct 15 '22 21:10 paulitapb

Hi all,

I am interested in this issue and have a couple of remarks/questions.

It does not seem to be just a matter of a large number of cycles.

How many simple cycles do you expect g1.json graph to have?

The algorithm seems to progress on g1.json albeit slowly. When I tested it on g1.json, it stayed in this while loop for the first time for a few hours but eventually it did leave it and startnode was changed. I didn't run it until the end, since it looked like a job for more powerful computer(s).

The implementation of the function simple_cycles seems to follow to a great extent the pseudocode from the original paper except that it is not recursive. I find it correct provided that all the dependencies (like the computation of strongly connected components) are implemented correctly and the algorithm proposed in the paper is correct which I think we have no reason to doubt.

Finally, the number of simple cycles in directed graphs can grow quite rapidly even when adding a single edge. For example, there exists a directed graph with $4000$ vertices with no simple cycle, but adding a single edge introduces at least $2^{1000}$ simple cycles.

cc: @steven-nicolaou @paulitapb

rfulekjames avatar Oct 19 '22 14:10 rfulekjames

good point! I think that may be the problem. Because for g2 the number of simple cycles is 89113 which is not much compared with the worst case for a graph of 3587 nodes. And as @steven-nicolaou chose the number of nodes and edges randomly it's very likely that g1 is a graph with too many simple cycles.

paulitapb avatar Oct 19 '22 15:10 paulitapb

Interesting point and I was afraid about the possibility of 1 edge adding an exponential number of cycles. But it also crashes due to out of memory error if I let it run long enough. I thought it might be a memory leak, but if it's something that simply needs a long time to run, it should do so without crashing.

steven-nicolaou avatar Oct 19 '22 16:10 steven-nicolaou

I haven't observed any memory leak while enumerating cycles of g1 for several hours. Since the function is a generator, storing simple cycles in memory throughout its execution is not necessary. Therefore I am not sure where the problem could be.

rfulekjames avatar Oct 20 '22 01:10 rfulekjames

If you try to store all the simple cycles in a e.g. list, you will run out of memory. Also, the code you posted above can't be the code you actually use because it only creates the generator -- and doesn't iterate over it -- so no work is actually done. Are you using it with code that will store all the paths?

dschult avatar Oct 20 '22 12:10 dschult

The function is still not working

antoscha avatar Nov 25 '23 23:11 antoscha

Do you mean this function is still working? :) The issue is that the function never stopped. So I guessit is still working. :)

@antoscha, If you want us to work on this, then you will need to provide more information. The current OP code runs instantly because it only creates the iterator. It doesn't iterate over the iterator. And the comments suggest that

  1. the number of cycles can grow very rapidly as some edges are added (and with other edges they grow very slowly).
  2. this explosion in number of cycles could fill memory if the iterator output is stored -- that could explain using up memory.

What do you have that is new information about this fuction?

dschult avatar Nov 26 '23 04:11 dschult

The information I have is the function apparently generates one concrete cycle infinitely. A simple workaround I used for a directed graph:

def find_cycles(graph: DiGraph) -> list[tuple[str, ...]]:
    cycles: set[tuple[str, ...]] = set()

    def loop(node: str, path: tuple[str, ...]) -> None:
        visited.add(node)
        for successor in graph.successors(node):
            if successor not in visited:
                loop(successor, (*path, successor))
            elif successor in path:
                cycles.add(path[path.index(successor):])

    for node in graph.nodes:
        visited: set[str] = set()
        loop(node, tuple(node))

    return list(cycles)

antoscha avatar Nov 26 '23 14:11 antoscha

Would you please provide a graph that exhibits that behavior?

boothby avatar Nov 26 '23 19:11 boothby

Sorry, it's a proprietary thing.

antoscha avatar Nov 26 '23 20:11 antoscha

I have managed to extract minimal working examples from proprietary data before. Does the algorithm exhibit the same behavior on the induced graph of the single cycle you're seeing repeated? That is, if C is the first cycle and you run simple_cycles(G.subgraph(C)) does it still repeat it? If that's the case, can you remove the labels from that graph with nx.relabel.convert_node_labels_to_integers? That is typically enough to strip any proprietary information.

Your implementation is incorrect, unfortunately. In addition to reporting the same cycle multiple times (from different starting points) it fails to list some cycles due to how you're using visited.

D = nx.DiGraph([
  (0, 1), (0, 2), (0, 3),
  (1, 2), (1, 3), (2, 3),
  (1, 4), (2, 4), (3, 4),
  (4, 0)
])

When I run your code on the graph above, it does not find the cycle (0, 2, 4).

Note that your use of visited only allows each node to be visited a single time during the exploration rooted at each given node. Therefore, only a quadratic number of cycles can be found. Where this is guaranteed to be fast, it must be incorrect as there may be an exponential number of simple cycles in a graph, for example using the graph D below with even n, the number of cycles grows like the fibonacci numbers (see https://oeis.org/A005248 for the counting sequence)

D = nx.cycle_graph(range(n), create_using=nx.DiGraph)
nx.add_cycle(D, range(0, n, 2))
nx.add_cycle(D, range(1, n, 2))

boothby avatar Nov 26 '23 21:11 boothby