networkanalysis icon indicating copy to clipboard operation
networkanalysis copied to clipboard

java.lang.ArrayIndexOutOfBoundsException when using -q Modularity

Open ghost opened this issue 4 years ago • 6 comments

The following command java -cp ~/Bioinformatics/networkanalysis/build/libs/networkanalysis-1.1.0-5-ga3f342d.jar nl.cwts.networkanalysis.run.RunNetworkClustering -q CPM -m 1 -w -o /tmp/edges_clustering.txt edges.txt runs just fine. However, if the -q option is set to "Modularity", I get the following crash:

RunNetworkClustering version 1.1.0 By Vincent Traag, Ludo Waltman, and Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Reading edge list from 'edges.txt'. Reading edge list took 0s. Network consists of 18386 nodes and 423926 edges with a total edge weight of 24546.50103795767. Using singleton initial clustering. Running Leiden algorithm. Quality function: Modularity Resolution parameter: 1.0 Minimum cluster size: 1 Number of random starts: 1 Number of iterations: 10 Randomness parameter: 0.01 Random number generator seed: random Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 5 at nl.cwts.networkanalysis.LocalMergingAlgorithm.findClustering(LocalMergingAlgorithm.java:190) at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:228) at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:276) at nl.cwts.networkanalysis.IterativeCPMClusteringAlgorithm.improveClustering(IterativeCPMClusteringAlgorithm.java:91) at nl.cwts.networkanalysis.run.RunNetworkClustering.main(RunNetworkClustering.java:413)

I've attached the input file here edges.txt I'm not sure what property of this input file would violate the preconditions of the software. The nodes are 0-indexed numbers, there are no duplicate edges, the lefthand node ID is always less than the righthand node ID, etc.

ghost avatar Feb 11 '21 00:02 ghost

The problem seems to be caused by the edges with zero weights. For the time being you can circumvent the problem by simply removing those edges. This can be done without any problem since edges with zero weights have no effect anyway.

Nonetheless, this is a bug that should be corrected. We will provide a fix at some later time.

vtraag avatar Feb 11 '21 11:02 vtraag

In particular, the root cause of the problem is that we check whether a neighboring cluster is already added in these conditionals:

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/LocalMergingAlgorithm.java#L188

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/FastLocalMovingAlgorithm.java#L162

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/StandardLocalMovingAlgorithm.java#L153

We should probably use a boolean array isClusterAdded instead of relying on the edge weight. This is the most robust way to address this issue I think. @neesjanvaneck, what do you think?

vtraag avatar Feb 11 '21 11:02 vtraag

Ah, I see. Makes perfect sense. I didn't intend to have any edges with zero weight. I'm happy to check for that condition and remove those edges, which are meaningless. Would you like me to close this issue (as far as I'm concerned, it's already resolved)?

ghost avatar Feb 11 '21 14:02 ghost

No problem, I imagine that you did not check this condition prior to running the algorithm. No, let's leave the issue open, as the program shouldn't crash on this input.

vtraag avatar Feb 11 '21 17:02 vtraag

I suppose a smaller change would simply be to filter out edges with zero weight in the code that reads the edges from the file. This way, you can enforce your own precondition, and leave the downstream code unchanged. The advantage is that it would avoid adding another variable for book-keeping.

ghost avatar Feb 11 '21 20:02 ghost

Yes, we would probably do that as well. However, the program still should not crash, even if an edge of zero weight would somehow be included.

vtraag avatar Feb 12 '21 18:02 vtraag