node2vec icon indicating copy to clipboard operation
node2vec copied to clipboard

node2vec Spark - memory issue

Open enricopal opened this issue 8 years ago • 3 comments

Hi, I'm trying to run node2vec using the Spark implementation on a large graph (~2.8M nodes, ~41M edges, 4.1GB file), this is the command that I'm running:

./spark-submit --class com.navercorp.Main node2vec/node2vec_spark/target/node2vec-0.0.1-SNAPSHOT.jar --cmd node2vec --p 1 --q 1 --walkLength 40 --numWalks 5 --input yago_types.edgelist --output output/yago_types_p1_q1_l40_num5.emb --weighted False --directed False --indexed False

I get this error: "2017-05-09T16:45:13.259237677Z 17/05/09 16:45:13 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on spark-worker1-97711-prod: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages"

Everything was working fine with a smaller sample, so it seems like a memory problem to me. Have you ever experienced anything similar? Any clue on what could be a proper memory allocation for such a size of a graph? At the moment, I have a master node with 2GB and six workers with 42GB.

Thank you a lot! Enrico

enricopal avatar May 10 '17 08:05 enricopal

Thank you for your post!

I solved additional problems. I will send a pull request soon.

Thank you! Ha-neul

august-yeom avatar May 20 '17 06:05 august-yeom

I have the same problem.Had it solved?

aijianiula0601 avatar Nov 27 '17 07:11 aijianiula0601

I have the same problem. Facing continuous OOM with Node2vec for a directed graph. What is the recommendation to address this please?

anbhat87 avatar Feb 19 '20 06:02 anbhat87