pygcn icon indicating copy to clipboard operation
pygcn copied to clipboard

Citeseer data set accuracy

Open xuhaiyun42 opened this issue 6 years ago • 18 comments

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

xuhaiyun42 avatar Nov 30 '18 02:11 xuhaiyun42

Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn

tkipf avatar Nov 30 '18 06:11 tkipf

Thank you for your prompt reply! I used the data of GCN with tensorflow version to save the features, adjacency matrix and label data set,and put them into pytorch, but the accuracy of tensorflow version could not be reproduced, for citeseer data set

xuhaiyun42 avatar Nov 30 '18 12:11 xuhaiyun42

Yes, the model implementation is slightly different for the PyTorch version. In this version, the adjacency matrix is normalized by left-multiplication with the inverse of the degree matrix (instead of the symmetric normalization used in the paper). Additionally, the first layer doesn’t use Dropout (if I remember correctly) since I didn’t have time to replicate the sparse dropout implementation from the TensorFlow-GCN implementation in PyTorch. The performance should be rather similar though. If you want to get better performance, you can also play around with different activation functions and regularizers (like LayerNorm or BatchNorm).

On Fri, Nov 30, 2018 at 1:56 PM xuhaiyun42 [email protected] wrote:

Thank you for your prompt reply! I used the data of GCN with tensorflow version to save the features, adjacency matrix and label data set,and put them into pytorch, but the accuracy of tensorflow version could not be reproduced, for citeseer data set

At 2018-11-30 14:33:10, "Thomas Kipf" [email protected] wrote:

Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-443195688, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYA__WaK701xYVyr6up6LrxE9z6XPks5u0SrugaJpZM4Y66om .

tkipf avatar Nov 30 '18 14:11 tkipf

Thank you again for your reply! which is of great help to me.

xuhaiyun42 avatar Nov 30 '18 14:11 xuhaiyun42

Dear professor,

Hello! It makes sense that you load the cora dataset this way and construct the adjacency matrix.
idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str)) print(idx_features_labels.shape) features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) labels = encode_onehot(idx_features_labels[:, -1])

build graph

idx = np.array(idx_features_labels[:, 0], dtype=np.int32) idx_map = {j: i for i, j in enumerate(idx)} edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32) edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape) adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32) How do you load data for Cornell datasets in a WebKB dataset?

The dataset can be seen in the attachment.I hope to get your help. Thank you very much!

xuhaiyun42 avatar Dec 04 '18 12:12 xuhaiyun42

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

swyu0711 avatar Dec 15 '18 09:12 swyu0711

The dataset splits used in the PyGCN repository are different from the ones used in our original paper. If you would just like to reproduce the results of the original GCN paper, then please use the TensorFlow repository :)

There are also some small implementation changes (due to the nature of PyTorch) that affect results.

On Sat, Dec 15, 2018 at 10:13 AM swyu0711 [email protected] wrote:

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 79.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-447552622, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYMX2ynnhd1B0n2ucLGKtqFGZ2UNCks5u5L1VgaJpZM4Y66om .

tkipf avatar Dec 15 '18 12:12 tkipf

@tkipf

Dear Thomas, if you have time could you elaborate just a bit on the "implementation changes" in the Pytorch version you mentioned above? I'm not necessarily interested in the cora data or those results but more interested in training on other graphs/datasets. Maybe even a parallel version at some point down the road using Horovod or Keras.

Thanks

bapriddy avatar Jan 13 '19 13:01 bapriddy

I think it’s best to compare the two implementations side by side if you’re interested in the precise differences (both implementations are fairly simple).

For a more modern and flexible implementation in PyTorch, I recommend having a look at https://github.com/dmlc/dgl

On Sun 13. Jan 2019 at 14:06 Cortes [email protected] wrote:

@tkipf https://github.com/tkipf

Dear Thomas, if you have time could you elaborate just a bit on the "implementation changes" in the Pytorch version you mentioned above? I'm not necessarily interested in the cora data or those results but more interested in training on other graphs/datasets. Maybe even a parallel version at some point down the road using Horovod or Keras.

Thanks

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-453828263, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYD49enz8GYgW2uMeBfBy0gswmHBOks5vCy9TgaJpZM4Y66om .

tkipf avatar Jan 13 '19 14:01 tkipf

@tkipf

Thanks Again!!

bapriddy avatar Jan 16 '19 16:01 bapriddy

One of the most important reason I think is that there is no available api in pytorch by which the dropout on sparse input can be implemented. In GCN-tensorflow-version Dr. Kipf implement sparse dropout by tf.sparse_retain but this api can not be found in torch. Because dropout is an important hyperparameterfor GCN (empirically) thus we may not have the ability to recover the accuracy without solving this problem.

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.

scottjiao avatar Apr 18 '19 06:04 scottjiao

The dataset splits in this repository are different from the ones used in the paper / used in the TesnorFlow gcn repository. This repository is not intended as an exact replication of the setting in our paper, so slight differences in model performance are to be expected (even beyond the dataset split detail).

On Thu 18. Apr 2019 at 08:50 Ziang Zhou [email protected] wrote:

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.

One of the most important reason I think is that there is no available api in pytorch by which the dropout on sparse input can be implemented. In GCN-tensorflow-version Dr. Kipf implement sparse dropout by tf.sparse_retain but this api can not be found in torch. Because dropout is an important hyperparameterfor GCN (empirically) thus we may not have the ability to recover the accuracy without solving this problem.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tkipf/pygcn/issues/27#issuecomment-484378368, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYAD27776GGD4FIDHGDPRAK5DANCNFSM4GHLVITA .

tkipf avatar Apr 18 '19 06:04 tkipf

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

Hello, I want to know where can I download the citeseer dataset in the form similar to cora dataset in this implementation(citeseer.cites, citeseer.content). Thank you a lot!

Yfhu1103 avatar Dec 14 '19 07:12 Yfhu1103

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.

Hello! Have you found the reason why the accuracy differed everytime on the cora dataset provided by this repository? If you know the reason, please tell me, thank you.

Riting-Xia avatar Mar 31 '20 03:03 Riting-Xia

Note that this repository uses different dataset splits and a slightly different model architecture than in our original paper. For an exact replication of the experiments in our paper, please have a look at this repository: https://github.com/tkipf/gcn

Dear Dr Kipf,

I am a big fan of your works and really interested in this code you shared. Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.

Cheers,

fansariadeh avatar Jun 30 '20 08:06 fansariadeh

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.

Many thanks for your help.

Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.

fansariadeh avatar Jun 30 '20 09:06 fansariadeh

Dear professor, Hello! It makes sense that you load the cora dataset this way and construct the adjacency matrix. idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str)) print(idx_features_labels.shape) features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) labels = encode_onehot(idx_features_labels[:, -1]) # build graph idx = np.array(idx_features_labels[:, 0], dtype=np.int32) idx_map = {j: i for i, j in enumerate(idx)} edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset), dtype=np.int32) edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape) adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32) How do you load data for Cornell datasets in a WebKB dataset? The dataset can be seen in the attachment.I hope to get your help. Thank you very much!

Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.

fansariadeh avatar Jun 30 '20 09:06 fansariadeh

Hello. I also run the pytorch version GCN on citeseer dataset and the accuracy is 69.65%. Furthermore, the accuracy differed everytime on the cora dataset provided by this repository. However, the accuracy is invariable on datasets provided by the original GCN repository https://github.com/tkipf/gcn. I can't understand why. I really want to know whether someone else got the same results. Thank you.

Dear professor, Hello! I am very interesting in your recent GCN work. Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway. Many thanks for your help.

Hello! Have you found the reason why the accuracy differed everytime on the cora dataset provided by this repository? If you know the reason, please tell me, thank you.

Regarding Citeseer dataset, I have download it from https://github.com/kimiyoung/planetoid, which hopefully is the same data you ahve used. My problem is reading this file and specifying which file creates graph and which one makes edges. Can you please elaboarte it more. Thank you in advance.

fansariadeh avatar Jun 30 '20 09:06 fansariadeh