pygcn
pygcn copied to clipboard
Citeseer data set accuracy
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn
Thank you for your prompt reply! I used the data of GCN with tensorflow version to save the features, adjacency matrix and label data set,and put them into pytorch, but the accuracy of tensorflow version could not be reproduced, for citeseer data set
Yes, the model implementation is slightly different for the PyTorch version. In this version, the adjacency matrix is normalized by left-multiplication with the inverse of the degree matrix (instead of the symmetric normalization used in the paper). Additionally, the first layer doesn’t use Dropout (if I remember correctly) since I didn’t have time to replicate the sparse dropout implementation from the TensorFlow-GCN implementation in PyTorch. The performance should be rather similar though. If you want to get better performance, you can also play around with different activation functions and regularizers (like LayerNorm or BatchNorm).
On Fri, Nov 30, 2018 at 1:56 PM xuhaiyun42 [email protected] wrote:
Thank you for your prompt reply! I used the data of GCN with tensorflow version to save the features, adjacency matrix and label data set,and put them into pytorch, but the accuracy of tensorflow version could not be reproduced, for citeseer data set
At 2018-11-30 14:33:10, "Thomas Kipf" [email protected] wrote:
Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-443195688, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYA__WaK701xYVyr6up6LrxE9z6XPks5u0SrugaJpZM4Y66om .
Thank you again for your reply! which is of great help to me.
Dear professor,
Hello!
It makes sense that you load the cora dataset this way and construct the adjacency matrix.
idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str))
print(idx_features_labels.shape)
features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32)
labels = encode_onehot(idx_features_labels[:, -1])
build graph
idx = np.array(idx_features_labels[:, 0], dtype=np.int32) idx_map = {j: i for i, j in enumerate(idx)} edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32) edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape) adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32) How do you load data for Cornell datasets in a WebKB dataset?
The dataset can be seen in the attachment.I hope to get your help. Thank you very much!
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
The dataset splits used in the PyGCN repository are different from the ones used in our original paper. If you would just like to reproduce the results of the original GCN paper, then please use the TensorFlow repository :)
There are also some small implementation changes (due to the nature of PyTorch) that affect results.
On Sat, Dec 15, 2018 at 10:13 AM swyu0711 [email protected] wrote:
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 79.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-447552622, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYMX2ynnhd1B0n2ucLGKtqFGZ2UNCks5u5L1VgaJpZM4Y66om .
@tkipf
Dear Thomas, if you have time could you elaborate just a bit on the "implementation changes" in the Pytorch version you mentioned above? I'm not necessarily interested in the cora data or those results but more interested in training on other graphs/datasets. Maybe even a parallel version at some point down the road using Horovod or Keras.
Thanks
I think it’s best to compare the two implementations side by side if you’re interested in the precise differences (both implementations are fairly simple).
For a more modern and flexible implementation in PyTorch, I recommend having a look at https://github.com/dmlc/dgl
On Sun 13. Jan 2019 at 14:06 Cortes [email protected] wrote:
@tkipf https://github.com/tkipf
Dear Thomas, if you have time could you elaborate just a bit on the "implementation changes" in the Pytorch version you mentioned above? I'm not necessarily interested in the cora data or those results but more interested in training on other graphs/datasets. Maybe even a parallel version at some point down the road using Horovod or Keras.
Thanks
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-453828263, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYD49enz8GYgW2uMeBfBy0gswmHBOks5vCy9TgaJpZM4Y66om .
@tkipf
Thanks Again!!
One of the most important reason I think is that there is no available api in pytorch by which the dropout on sparse input can be implemented. In GCN-tensorflow-version Dr. Kipf implement sparse dropout by tf.sparse_retain but this api can not be found in torch. Because dropout is an important hyperparameterfor GCN (empirically) thus we may not have the ability to recover the accuracy without solving this problem.
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.
The dataset splits in this repository are different from the ones used in the paper / used in the TesnorFlow gcn repository. This repository is not intended as an exact replication of the setting in our paper, so slight differences in model performance are to be expected (even beyond the dataset split detail).
On Thu 18. Apr 2019 at 08:50 Ziang Zhou [email protected] wrote:
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.
One of the most important reason I think is that there is no available api in pytorch by which the dropout on sparse input can be implemented. In GCN-tensorflow-version Dr. Kipf implement sparse dropout by tf.sparse_retain but this api can not be found in torch. Because dropout is an important hyperparameterfor GCN (empirically) thus we may not have the ability to recover the accuracy without solving this problem.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-484378368, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYAD27776GGD4FIDHGDPRAK5DANCNFSM4GHLVITA .
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
Hello, I want to know where can I download the citeseer dataset in the form similar to cora dataset in this implementation(citeseer.cites, citeseer.content). Thank you a lot!
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.
Hello! Have you found the reason why the accuracy differed everytime on the cora dataset provided by this repository? If you know the reason, please tell me, thank you.
Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn
Dear Dr Kipf,
I am a big fan of your works and really interested in this code you shared. Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.
Cheers,
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.
Dear professor, Hello! It makes sense that you load the cora dataset this way and construct the adjacency matrix. idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str)) print(idx_features_labels.shape) features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) labels = encode_onehot(idx_features_labels[:, -1]) # build graph idx = np.array(idx_features_labels[:, 0], dtype=np.int32) idx_map = {j: i for i, j in enumerate(idx)} edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32) edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape) adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32) How do you load data for Cornell datasets in a WebKB dataset? The dataset can be seen in the attachment.I hope to get your help. Thank you very much!
Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.
Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.
Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.
Hello! Have you found the reason why the accuracy differed everytime on the cora dataset provided by this repository? If you know the reason, please tell me, thank you.
Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.