rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

`bipartite.projection()` creates an object many times larger than the original graph

Open stnorton opened this issue 6 years ago • 2 comments

When creating a bipartite projection with bipartite.projection, the projection is considerably larger than the original graph object. This seems to scale non-linearly with the number of vertices. I have uploaded a reproducible example here which results in a list ~7 bigger than the original network, but using the same data with 1000 observations results in an object ~112 times larger. Using the full network (~ 3 million vertices) requires more than 300 Gb to allocate the list.

This may not be a fixable issue (i.e. it may be an issue with storing the projections as a list), but creates problems with working with large bipartite networks. Ultimately this can be gotten around in my analysis by subsetting the function call to get only the mode you need to work with at the moment. It is certainly unexpected memory usage behavior though.

stnorton avatar Jul 10 '19 13:07 stnorton

Just to be clear what the problem is: the number of edges in the bipartite projection is only ~7 times larger, while the necessary memory is ~112 times larger?

Just to be sure, the bipartite projection typically has much more edges, and the original bipartite graph may be a more efficient representation. For example, if node A is connected to nodes 1-10, this takes only 10 edges to store in the bipartite graph, but if you project in the nodes 1-10, it requires 10*9/2 = 45 edges.

Only if the necessary memory is unnecessarily large we could try to find a solution, but if the amount of necessary memory simply scales with the number of edges, there is not much that can be done.

vtraag avatar Jul 10 '19 14:07 vtraag

Yes, I seem to be getting a larger scaling factor in size than I do in edges with the bipartite projection. On a 1,000 observation bipartite network (i.e. 1000 individuals, 12 groups) the number of edges increases by a factor of 76 while the memory needed to store the object increases by 86.

This is considerably larger for the full (3.2 million individuals network). The initial igraph object is 0.64 Gb, but the bipartite projection required 527 Gb of memory. Any reduction in the scale of memory usage could make analysis easier/cheaper.

stnorton avatar Jul 10 '19 17:07 stnorton

I'm afraid there's not much that can be done about this as this is how the internal representation of an igraph graph looks like. igraph stores not only the edge list but also various indices that make certain lookup operations faster; these indices may scale as O(m) (i.e. the number of edges) or O(n) (i.e. the number of vertices). So, as long as we stick to the premise that igraph graphs are solely in-memory objects, the only thing that could be done here is to completely switch to an alternative (less memory-intensive) representation, which may come with its own performance penalties.

To add insult to injury, igraph's in-memory representation in R is closely aligned with the in-memory representation in C (so we could simply create vector views into existing R vectors when handling the same graph from C), so any changes in the graph representation on the R level must be accompanied with the same changes in the C layer, which could then have impact on other higher level interfaces as well.

ntamas avatar Nov 08 '22 15:11 ntamas