NetMF
NetMF copied to clipboard
Readme.md and data processing
Hi, Could you please provide some information on your NetMF code and data preparation? Thank you very much.
Hi, to run this NetMF code, you need a python2.7 environment with numpy, scipy, theano, and scikit-learn, one easy way to install all these required packages is using Anaconda Python distribution rather than official python distribution. The dataset should be a .mat file, which contains a variable named 'network'. The 'network' variable's type should be Compressed Sparse Column format, which is a sparse matrix in scipy, called ’csc_matrix'. You can download an example dataset called "blogcatalog" from this repository, https://github.com/phanein/deepwalk, in which a folder called example_graphs contains that dataset. That dataset was also mentioned in the NetMF paper. After preparing all of these steps above, you can run that code, using 'python netmf.py -h', then you can see the instruction which can guide you to adjust the parameters for NetMF. Actually, the author proposed two methods to implement NetMF, one for small window size T, which do not need to use eigen-vector to approximate the original matrix, the other one for large window size T, in which you need to specify the parameter h, which means how many eigen-vector you want to use to approximate the original matrix. So according to your application, you need to choose which type of NetMF you want use. Using the parameter --small or --large can choose which type of NetMF you want to run. There are two example to run netmf.py:
- python netmf.py --input blogcatalog.mat --dim 128 --window 1 --small --output test
- python netmf.py --input blogcatalog.mat --dim 128 --window 10 --rank 1024 --large --output test Actually, the output in the example above, I use '--output test' to specify my output file's name is 'test', but when the computing process finish, the program will generate a file called 'test.npy', that's a file you need to load using numpy, which can be written in python as: output = numpy.load('test.npy') And then you will get an output matrix.
@Davidham3 Thank you so much for your information in detail!
@Davidham3 We reimplement NetMF with your experiments on BlogCatalog and results seems a little bit different. Could you please kindly send me all your configuration on dim/neg sample rate/T, etc or your vector embedding to avoid mistake caused by my reproduction?
Another issue is that embedding procedure in flickr dataset with 80K nodes is resource-wasting (over 16G) when doing Eigen decomposition or sparse matrix todense() operation. I appreciate your suggestions.
My email is [email protected]. Thank you very much for your response and connection.
Getting an error here too... https://colab.research.google.com/drive/1k5NLfvLniM4A_v0VWckUm8S1Nmdhzcn_