node2vec-c icon indicating copy to clipboard operation
node2vec-c copied to clipboard

data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set

Open jackyhawk opened this issue 3 years ago • 2 comments

Thanks for the excellent code.

and I met one question, my data set is too big (which can not be held in one machine's mem), and I should break it to small daily set. so I should first generate each day's walk result (sequence) and then train by other code(suan as Gensim) as word2vec.

All I want is the random walking result

as for the walking result, should I just return before the part listed as following? and then save dw_rw to disk for latter training? 1652349681(1)

jackyhawk avatar May 12 '22 10:05 jackyhawk

You will need to deal with multiprocessing slightly better than I do in the training loop. One option would be to just run the random walk generation and write to the file in the single thread. As for the place, it is correct.

xgfs avatar May 12 '22 13:05 xgfs

Thanks very much. Is there any other repo that is available to generate random walk sequence for big data set? I found when I use data set bigger than 10 million edge, the memory required would be bigger than my memory capacity(200G)

jackyhawk avatar May 12 '22 15:05 jackyhawk