GraphSAGE
GraphSAGE copied to clipboard
Question about example data
Hi! I'm reading this paper and try to exploit GraphSAGE to other application. I have two questions about input data format.
-
feature data In the example data, both toy-ppi-G.json and toy-ppi-feats.npy have features for each node. In toy-ppi-G.json, is feature attribute is needed?
-
test/val attribute for <train_prefix>-G.json Is the attribute of test/val for each node mandatory? If so, how can I add the attribute by networkx library? such as nx.set_node_attributes(G, 'test', true)?
In tutorial on WWW2018 (p.34), the preprocessing is little bit explained, but I cannot understand because I'm new to graph analysis. http://snap.stanford.edu/proj/embeddings-www/files/nrltutorial-part3-applications.pdf
I found similar question for feature data. https://github.com/williamleif/GraphSAGE/issues/61
How about the attribute?
I tried to make preprocessing code for cora dataset which is used in pytorch implementation. https://gist.github.com/k1ochiai/d9c66fc50bf3f7181f9337753c68b80a#file-preprocessing_for_graphsage-ipynb
@k1ochiai did it work this way? I'm using the same preprocessing and getting strange results in the end.
I'm getting the following erorr when using the data generated by the above ipynb @k1ochiai in graphsage.
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1741, in <module>
main()
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/preetham/Documents/GraphSAGE/graphsage/utils.py", line 99, in <module>
G = json_graph.node_link_graph(G_data)
File "/Users/preetham/Documents/GraphSAGE/venv/lib/python2.7/site-packages/networkx/readwrite/json_graph/node_link.py", line 165, in node_link_graph
graph.add_edge(mapping[src], mapping[tgt], **edgedata)
IndexError: list index out of range
@sqrhussain I also experienced a strange result. So, I tried to use pytorch implementation. https://github.com/williamleif/graphsage-simple
@preetham-salehundam It seems like you used python 2.7, but I only executed on python 3.6.
@k1ochiai I used the same preprocessing and got the following results for cora dataset:
python -m graphsage.supervised_train --train_prefix ./example_data/data/data --model gcn --sigmoid
Since most of the epochs are empty I reduced the batch_size to 32. (i.e only 884 training nodes are there)
Then I got the following results. However, the pytorch implementation gives "Validation F1: 0.859999". Any idea to fix this issue ?
python -m graphsage.supervised_train --train_prefix ./example_data/data/data --model gcn --batch_size 32 --sigmoid
@k1ochiai I found an error of the above processor code: The following line doesn't map the correct one-hot encoded label to the relevant node
class_map = {k: list(labels_one_hot[i]) for i, k in enumerate(node_map.keys())}
It should be corrected as follows:
class_map = {i: list(labels_one_hot[k]) for i, k in nodes.items()}
Then I could observe proper outputs.
@sqrhussain I also experienced a strange result. So, I tried to use pytorch implementation. https://github.com/williamleif/graphsage-simple
@preetham-salehundam It seems like you used python 2.7, but I only executed on python 3.6.
I executed on python 3.6, also got the same error。
I'm getting the following erorr when using the data generated by the above ipynb @k1ochiai in graphsage.
Traceback (most recent call last): File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1741, in <module> main() File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/Users/preetham/Documents/GraphSAGE/graphsage/utils.py", line 99, in <module> G = json_graph.node_link_graph(G_data) File "/Users/preetham/Documents/GraphSAGE/venv/lib/python2.7/site-packages/networkx/readwrite/json_graph/node_link.py", line 165, in node_link_graph graph.add_edge(mapping[src], mapping[tgt], **edgedata) IndexError: list index out of range
Did you solved this problem? I compared the json format of core-data-G.json and tpy-ppi-G.json data, I found the cora data missed items of features and labels. I am not sure if it is the reason of this error.
Hi, I ran into the same problem and I was saving the json files in a different version of networkx. As I used the same version, the problem was solved. Apparently, and obviously, networkx has changed some of the code around that. I hope this helps
yes nx has made several updates. i'll be happy to merge in fix that works with new networkx version and update requirements.
hi, can anyone share the final results of cora on graph sage(tf version)? and what parameters you used to get that resutls. I try to reproduce the results on pytorch-version of graphsage but seems their training logic is different.. also the split of cora dataset is different from the original gcn version
How to preprocess dataset to get -xxx.json file? someone can help me? thanks a lot.