igraph icon indicating copy to clipboard operation
igraph copied to clipboard

Fast label propagation is extremely slow on some inputs (possible infinite loop)

Open szhorvat opened this issue 1 year ago • 23 comments

OSS-Fuzz found a testcase on which fast label propagation does not seem to finish (didn't finish after 5 min).

  • https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68970
  • https://oss-fuzz.com/testcase-detail/6390855211483136

The affected fuzz target is community, which currently uses an unweighted directed graph.

Profiling result after one minute:

image

Minimized testcase is attached: slow_good.zip

This is a multigraph with a triangle, a self-loop, and lots of isolated vertices (not shown below). Notice that only one side of the triangle is unidirectional, both others being bidirectional. So this might not be a trivial case of labels running around a circle.

image

Edge list with 0-based indexing:

49 50
49 55
50 49
50 49
50 50
55 49
55 50
55 50
55 50

The issue affects only IGRAPH_LPA_FAST, not IGRAPH_LPA_RETENTION or IGRAPH_LPA_DOMINANCE.

szhorvat avatar May 13 '24 09:05 szhorvat

I changed IGRAPH_LPA_FAST to ignore edge directions in the fuzzer until we debug this. This is also useful to discover if the undirected version is affected by these issues.

szhorvat avatar May 15 '24 18:05 szhorvat

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 27 '25 01:04 stale[bot]

@vtraag, any suggestions on how to proceed with this?

szhorvat avatar Apr 27 '25 11:04 szhorvat

Edge list remapped to get rid of isolates (49 becomes 0, 50 becomes 1, 55 becomes 2):

0 1
0 2
1 0
1 0
1 1
2 0
2 1
2 1
2 1

In this graph, for as long as there are exactly two colors in the loop, the algorithm will get stuck in an infinite loop as the membership vectors will evolve like this:

  • [2, 2, 1] will lead to [2, 1, 1] (node 1 has three edges from node 2, colored to 1, and only two edges from other nodes colored to 2).
  • [2, 1, 1] will lead to [2, 1, 2] (node 2 has only one incoming edge, from node 0, so it will always follow the color of node 0 with a single-step delay)
  • [2, 1, 2] will lead to [1, 1, 2] with a probability of 50% (node 0 has one edge from node 2 and one edge from node 1 so if they are not the same label, node 0 will flip colors with probability 50%)
  • [1, 1, 2] will lead to [1, 2, 2] (same as the first item but with flipped colors).
  • [1, 2, 2] will lead to [1, 2, 1] (same as the second item but with flipped colors).
  • [1, 2, 1] will lead to [2, 2, 1] (same as the third item but with flipped colors).

ntamas avatar May 12 '25 17:05 ntamas

Added a test case; not included in the test suite yet until we find a fix for it.

ntamas avatar May 12 '25 18:05 ntamas

@vtraag, any input on this? This is the new algorithm you contributed.

Notes:

  • Not yet released.
  • Interruptible.
  • This problem affects directed, unweighted.
  • #2561 is similar but affects the original LPA in the weighted case.

szhorvat avatar May 13 '25 07:05 szhorvat

The critical problem here seems to be that node 50 / 1 has a self-loop. This leads to it not being able to move to the "majority". Presumably it would be best to simply ignore self-loops in counting the labels. What do you think @lovre?

vtraag avatar May 13 '25 21:05 vtraag

The critical problem here seems to be that node 50 / 1 has a self-loop. This leads to it not being able to move to the "majority". Presumably it would be best to simply ignore self-loops in counting the labels. What do you think @lovre?

Do you want me to run the fuzzer on loopless graphs a bit to see if it still finds a similar problem? I can do this on the week after next.

szhorvat avatar May 13 '25 21:05 szhorvat

You know what, instead of running the fuzzer on my machine, I'll just re-enable this test on OSS-fuzz but with loops removed. That's the easiest way to figure out of this is truly loop-related.

szhorvat avatar May 13 '25 21:05 szhorvat

The critical problem here seems to be that node 50 / 1 has a self-loop. This leads to it not being able to move to the "majority". Presumably it would be best to simply ignore self-loops in counting the labels. What do you think @lovre?

I agree. I would always remove loops before running any label propagation algorithm, since loops have a similar effect as the retention strategy (which does not produce desirable results).

lovre avatar May 14 '25 11:05 lovre

The fuzzer has found loopless timeout cases, but I won't have the time to extract it in the next 10 days. @vtraag, if you can do it, here's the bug: https://issues.oss-fuzz.com/issues/417518227

szhorvat avatar May 14 '25 20:05 szhorvat

@lovre @vtraag Minimal example without loops:

#include <igraph.h>

int main(void) {
    igraph_vector_int_t membership;
    igraph_t graph;

    igraph_small(&graph, 0, IGRAPH_DIRECTED,
        2, 6, 3, 2, 0, 6, 3, 2, 0, 2, 0, 6, 6, 3, 2, 0, 3, 6, 6, 0, 2, 0,
        -1);

    igraph_vector_int_init(&membership, 0);

    igraph_community_label_propagation(&graph, &membership, IGRAPH_OUT, NULL, NULL, NULL, IGRAPH_LPA_FAST);

    igraph_vector_int_destroy(&membership);
    igraph_destroy(&graph);

    return 0;
}
Image

Fuzzer test case:

minimized-from-c91867a1f3e3f53757210fa660cdfbac348f507a.gz

szhorvat avatar May 15 '25 16:05 szhorvat

This suggests it has something to do with the directionality. If memory serves me well, this is something that was introduced in igraph. Now looking at it, I think this might cause problems because the dominance relationships are asymmetric in a sense, and so perhaps there does not exist a proper solution for all labels to be maximal then. @lovre do you know if this has been studied?

vtraag avatar May 15 '25 23:05 vtraag

Directedness are certainly problematic, with very few papers on it, but something more may be going wrong here:

It seems that both the presence of the isolated vertices, as well as the vertex ordering is relevant for triggering the issue. The vertex order should not matter, should it @vtraag ?

This is the same graph, just with reordered vertices. Notice the vertex count being set to 7, as vertices are reindexed so that the isolated ones would come at the end.

    igraph_small(&graph, 7, IGRAPH_DIRECTED,
        0, 1, 2, 0, 3, 1, 2, 0, 3, 0, 3, 1, 1, 2, 0, 3, 2, 1, 1, 3, 0, 3,
        -1);

szhorvat avatar May 16 '25 05:05 szhorvat

Vertex order, or better yet, edge order, may matter. We randomise only once in the beginning, and then simply add neighbours to the queue. This might end in a particular cycle of non stricly-dominant labels, I think. We can introduce additional stochasticity to break that, but I'm not sure whether it would solve all problems. If there exist graphs that have no directed dominant label set somehow, then issues would remain.

Given that the directedness might not make much sense here, could perhaps support for direction be removed? Or are there good reasons for keeping it?

vtraag avatar May 16 '25 06:05 vtraag

We randomise only once in the beginning

Are you sure this is done for the fast label propagation? I cannot see where it happens for that version. A node order vector is set up but not shuffled?

While it won't solve this specific problem, can you take a look?

szhorvat avatar May 16 '25 06:05 szhorvat

Given that the directedness might not make much sense here, could perhaps support for direction be removed? Or are there good reasons for keeping it?

It is one of the few community detection methods that work on directed graphs. People who work with directed graphs and want to do community detection will have one less alternative if we remove support for directionality.

ntamas avatar May 16 '25 08:05 ntamas

The problem is that it hasn't been studied seriously for directed graphs. So we don't know what to expect.

That said, I am definitely in favour of keeping it, even if we can't fix this issue. We can always make a note in the documentation about this potential issues, and the implementation is interruptible, so people can put a time limit on it.

If we can sort out all issues, maybe there's an opportunity for a paper :-)

szhorvat avatar May 16 '25 08:05 szhorvat

This is from @lovre's paper:

In contrast, there seems to be no straightforward extension to networks with directed arcs. The reason for this is that propagating the labels exclusively in the direction of arcs enables exchange of labels only between mutually reachable nodes forming a strongly connected component. Since any directed network is a directed acyclic graph on its strongly connected components, the labels can propagate between the nodes of different strongly connected components merely in one direction. Therefore, one usually disregards the directions of arcs when applying the label propagation method to directed networks except in the case when most arcs are reciprocal.

That said, there are a handful of papers that discuss directed label propagation. I collected a few a while ago but had no time to read them yet. Also, there are issues even within strongly connected components: if we have a cycle graph, the labels will tend to run around in circles and it will take much longer to reach a stable state.

szhorvat avatar May 16 '25 08:05 szhorvat

I agree. I do not believe that label propagation is "valid" for directed graphs, or at least I am not aware of a paper showing that it converges (to something meaningful).

My implementation of FLPA in NetworkX ignores edge directions when applied to directed graphs fast_label_propagation_communities.

lovre avatar May 16 '25 08:05 lovre

The critical problem here seems to be that node 50 / 1 has a self-loop. This leads to it not being able to move to the "majority". Presumably it would be best to simply ignore self-loops in counting the labels. What do you think @lovre?

I agree. I would always remove loops before running any label propagation algorithm, since loops have a similar effect as the retention strategy (which does not produce desirable results).

@vtraag @lovre Coming back to the topic of loops, what is your recommendation now that we figured out that it's not self-loops that caused the infinite loop but directedness?

With other community detection methods, such as modularity based ones, self-loops are in fact very relevant. Think about contracting a group of vertices into a single vertex: the total weight of group-internal connections needs to be preserved, and this is done through self-loops.

szhorvat avatar May 18 '25 11:05 szhorvat