node2vec icon indicating copy to clipboard operation
node2vec copied to clipboard

Consuming too much memory

Open Soumyajit opened this issue 8 years ago • 7 comments

I have a small graph of about 100MB (edge-list file size). #nodes = 65k, #edges = 3.5m. Node2vec just does not run on this graph. I have tracked the problem to being in the preprocess_transition_probs() function. This function gradually eats my whole system memory within 30minutes and everything hangs. I don't even reach the word2vec part after random walks.

I am running experiments on a i7, 16GB laptop. Deepwalk and LINE are able to process upto 1GB graphs. Deepwalk on this 100MB file runs in like 5minutes (including the gensim w2v procedure).

Soumyajit avatar Feb 07 '17 10:02 Soumyajit

Hi Soumyajit! I am using node2vec for my research activities and encountered a similar problem... I use a less memory intensive implementation now https://github.com/MultimediaSemantics/entity2vec, by saving the walks to 0a zip file and then reading it through an iterator one line at the time for the word2vec learning part. You still need to go past the preprocess_transition_probs() part though. Hope this can help!

enricopal avatar Feb 07 '17 13:02 enricopal

Hi,

Have you tried our high performance, multithreaded C++ implementation: https://github.com/snap-stanford/snap/tree/master/examples/node2vec

roks avatar Feb 08 '17 00:02 roks

Is C++ implementation for node2vec multithreaded?

zhushun0008 avatar May 06 '18 13:05 zhushun0008

@enricopal I tried the entity2vec to generate walks, but it was too slow. Does some parallel version of generating walks exist?

zhushun0008 avatar Jun 08 '18 06:06 zhushun0008

@zhushun0008 I tried to use https://github.com/snap-stanford/snap/tree/master/examples/node2vec. this run very quickly. my input is 2,833,276 edges

stray-leone avatar Nov 07 '18 08:11 stray-leone

@roks The C++ implementation suffers from the same problem (preprocess_transition_probs eats a lot of memory, as described in an issue here). The program got killed by the system on a small graph with 20M edges :(

I think the problem is some of us are trying to run it on the projected/ folded graphs. These graphs have a lot of cliques and the precomputation of transition probability may make the performance close to O(E^2).

VVCepheiA avatar Nov 08 '18 03:11 VVCepheiA

node2vec requires significant amount of memory for graph of your size. Check out http://snap.stanford.edu/graphsage/ for a less memory demanding solution.

roks avatar Nov 08 '18 18:11 roks