python-igraph
python-igraph copied to clipboard
VertexDendrogram.as_clustering() sometimes fails?
From @nadesai on August 27, 2014 20:5
Most of the time VertexDendrogram.as_clustering() has worked fine for me, but I recently encountered a VertexDendrogram that persistently fails when I try to call as_clustering() on it. The traceback is
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
<ipython-input-14-f8101bd3b52b> in <module>()
----> 1 dendrogram.as_clustering()
/usr/local/lib/python3.4/site-packages/igraph/clustering.py in as_clustering(self, n)
957 idgen = UniqueIdGenerator()
958 membership = community_to_membership(self._merges, num_elts, \
--> 959 num_elts - n)
960 membership = [idgen[m] for m in membership]
961 return VertexClustering(self._graph, membership,
InternalError: Error at community.c:769: `steps' to big or `merges' matrix too short, Invalid value
which comes from src/community.c#L769.
The VertexDendrogram in question is the result of calling community_edge_betweenness() on the network science collaboration graph, available from the "Coauthorships in network science" dataset available at http://www-personal.umich.edu/~mejn/netdata/.
Is this failure behavior expected? Is is impossible to convert some VertexDendrograms into clusterings?
Copied from original issue: igraph/igraph#675
From @ntamas on August 27, 2014 20:25
This is a known bug with incomplete dendrograms; basically community_edge_betweenness produces an incomplete dendrogram when the input graph is not fully connected. You can either extract the largest connected component of the input graph first, or try to fix the dendrogram using the function submitted here:
https://lists.nongnu.org/archive/html/igraph-help/2014-02/msg00067.html
This should be fixed in the next version.
From @nadesai on August 27, 2014 21:14
Confirming that this problem was only cropping up with disconnected graphs for me. Thanks a lot!
@gaborcsardi Hi, The python file attached on the mailing list seems to be a binary file. How do I use it? I thought that it contained a function so that I could use it for fixing the above-mentioned issue.
Just rename the file to fix_dendrogram.py after downloading it; the mailing list server incorrectly assumes that it is a binary file but it isn't.
i see the function fix_dendrogram(graph, cl) what is cl?
cl is an incomplete (partial) dendrogram returned by some of the low-level C functions in igraph. The Dendrogram class requires a full dendrogram; fix_dendrogram() merges the multiple roots of the partial dendrogram in an arbitrary order.
Hey! Why was this closed?
We have experienced this issue and we had to include the fix_dendrogram() function to fix it, but we are wondering why it isn't fixed inside the python-igraph library directly.
The issue should not have been closed in the first place, thanks for letting me know.
The fix_dendrogram() function is a crude hack only; basically, when you give it an incomplete dendrogram, it simply merges the roots of the dendrogram in an arbitrary order. This happens to work for the incomplete dendrogram returned by igraph_community_edge_betweenness() from the C layer because we know that the function stops in a situation when the remaining communities are disconnected (so they could be merged in any arbitrary order), but this is not necessarily true for any partial dendrogram. Doing this "properly" would require more time than what I can dedicate to python-igraph. At least the fix_dendrogram() funciton should check whether there are any remaining edges in the graph between the communities, and refuse to "fix" the dendrogram if this is the case.
Well FYI, the problem is not concerning only the igraph_community_edge_betweenness() function. We are using, for instance community_walktrap(), among others, and it does have the same problem.
Well, it could be - the fix_dendrogram() function I posted to the mailing list was a temporary solution only for the question of that particular user, and I did not have time to check what other parts of the library might be affected by the same bug. That's why it's not in python-igraph in the first place - I wanted to come up with a proper solution and not a workaround, but never found the time to do it.
Anyway, first we would have to decide whether the fact that the C core returns a partial dendrogram is a bug or a feature. If it is a bug, it has to be fixed in the C core and then there's no need for tidying up the dendrogram in the Python interface. If it is a feature, we would still need to implement some checks to see if it is valid to just merge the roots of the partial dendrogram in an arbitrary order, and/or fix the VertexDendrogram class to be able to handle dendrograms with multiple roots (which would be really so much nicer than just merging the roots, but takes even more time to do properly).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.