GraphSAGE icon indicating copy to clipboard operation
GraphSAGE copied to clipboard

Question about example data

Open k1ochiai opened this issue 6 years ago • 13 comments

Hi! I'm reading this paper and try to exploit GraphSAGE to other application. I have two questions about input data format.

  1. feature data In the example data, both toy-ppi-G.json and toy-ppi-feats.npy have features for each node. In toy-ppi-G.json, is feature attribute is needed?

  2. test/val attribute for <train_prefix>-G.json Is the attribute of test/val for each node mandatory? If so, how can I add the attribute by networkx library? such as nx.set_node_attributes(G, 'test', true)?

In tutorial on WWW2018 (p.34), the preprocessing is little bit explained, but I cannot understand because I'm new to graph analysis. http://snap.stanford.edu/proj/embeddings-www/files/nrltutorial-part3-applications.pdf

k1ochiai avatar Feb 08 '19 13:02 k1ochiai

I found similar question for feature data. https://github.com/williamleif/GraphSAGE/issues/61

How about the attribute?

k1ochiai avatar Feb 11 '19 01:02 k1ochiai

I tried to make preprocessing code for cora dataset which is used in pytorch implementation. https://gist.github.com/k1ochiai/d9c66fc50bf3f7181f9337753c68b80a#file-preprocessing_for_graphsage-ipynb

k1ochiai avatar Feb 11 '19 07:02 k1ochiai

@k1ochiai did it work this way? I'm using the same preprocessing and getting strange results in the end.

sqrhussain avatar Mar 12 '19 16:03 sqrhussain

I'm getting the following erorr when using the data generated by the above ipynb @k1ochiai in graphsage.

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1741, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/preetham/Documents/GraphSAGE/graphsage/utils.py", line 99, in <module>
    G = json_graph.node_link_graph(G_data)
  File "/Users/preetham/Documents/GraphSAGE/venv/lib/python2.7/site-packages/networkx/readwrite/json_graph/node_link.py", line 165, in node_link_graph
    graph.add_edge(mapping[src], mapping[tgt], **edgedata)
IndexError: list index out of range

preetham-salehundam avatar Apr 17 '19 18:04 preetham-salehundam

@sqrhussain I also experienced a strange result. So, I tried to use pytorch implementation. https://github.com/williamleif/graphsage-simple

@preetham-salehundam It seems like you used python 2.7, but I only executed on python 3.6.

k1ochiai avatar Apr 29 '19 02:04 k1ochiai

@k1ochiai I used the same preprocessing and got the following results for cora dataset:

python -m graphsage.supervised_train --train_prefix ./example_data/data/data --model gcn --sigmoid

cora_normal_form Since most of the epochs are empty I reduced the batch_size to 32. (i.e only 884 training nodes are there) Then I got the following results. However, the pytorch implementation gives "Validation F1: 0.859999". Any idea to fix this issue ?

python -m graphsage.supervised_train --train_prefix ./example_data/data/data --model gcn --batch_size 32 --sigmoid

batch_size _32_cora

AnuradhaSK avatar May 01 '19 13:05 AnuradhaSK

@k1ochiai I found an error of the above processor code: The following line doesn't map the correct one-hot encoded label to the relevant node

class_map = {k: list(labels_one_hot[i]) for i, k in enumerate(node_map.keys())}

It should be corrected as follows:

class_map = {i: list(labels_one_hot[k]) for i, k in nodes.items()}

Then I could observe proper outputs.

AnuradhaSK avatar May 04 '19 13:05 AnuradhaSK

@sqrhussain I also experienced a strange result. So, I tried to use pytorch implementation. https://github.com/williamleif/graphsage-simple

@preetham-salehundam It seems like you used python 2.7, but I only executed on python 3.6.

I executed on python 3.6, also got the same error。

kennethliukai avatar May 23 '19 01:05 kennethliukai

I'm getting the following erorr when using the data generated by the above ipynb @k1ochiai in graphsage.

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1741, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/preetham/Documents/GraphSAGE/graphsage/utils.py", line 99, in <module>
    G = json_graph.node_link_graph(G_data)
  File "/Users/preetham/Documents/GraphSAGE/venv/lib/python2.7/site-packages/networkx/readwrite/json_graph/node_link.py", line 165, in node_link_graph
    graph.add_edge(mapping[src], mapping[tgt], **edgedata)
IndexError: list index out of range

Did you solved this problem? I compared the json format of core-data-G.json and tpy-ppi-G.json data, I found the cora data missed items of features and labels. I am not sure if it is the reason of this error.

kennethliukai avatar May 23 '19 02:05 kennethliukai

Hi, I ran into the same problem and I was saving the json files in a different version of networkx. As I used the same version, the problem was solved. Apparently, and obviously, networkx has changed some of the code around that. I hope this helps

aisha-deeqa avatar Aug 09 '19 08:08 aisha-deeqa

yes nx has made several updates. i'll be happy to merge in fix that works with new networkx version and update requirements.

RexYing avatar Oct 17 '19 04:10 RexYing

hi, can anyone share the final results of cora on graph sage(tf version)? and what parameters you used to get that resutls. I try to reproduce the results on pytorch-version of graphsage but seems their training logic is different.. also the split of cora dataset is different from the original gcn version

NIRVANALAN avatar Dec 16 '19 02:12 NIRVANALAN

How to preprocess dataset to get -xxx.json file? someone can help me? thanks a lot.

anny0316 avatar Feb 11 '20 09:02 anny0316