gae Meaning of 'features' object

Dear @tkipf first of all, thank you for the excellent work! Your paper and the provided code are helpful to get started with GCN.

I'm currently trying to apply your algorithm to my data. After looking at the load_data() function, I was able to create an adjacency matrix in the same format as your Cora example.

However, I struggle with the node feature object, because I don't understand the meaning of the content cora.tx and cora.allx. The shape is (2709x1433) (#41 nodes x #features), so apparently, there are 1433 node features. The print is as follows

(0, 19) 1.0 (0, 81) 1.0 (0, 146) 1.0 (0, 315) 1.0 (0, 774) 1.0 (0, 877) 1.0 (0, 1194) 1.0 (0, 1247) 1.0 (0, 1274) 1.0 (1, 19) 1.0 (1, 88) 1.0 (1, 149) 1.0 (1, 212) 1.0 (1, 233) 1.0

How can we interpret the rows? I can't make sense of it. This format is usually an edge list format, but as we have node features, how do the edges come into play?

I looked into your other repositories and issues around this topic and couldn't find anything which helps me understand the structure of the .tx and .allx files.

https://github.com/tkipf/gcn/issues/36 https://github.com/tkipf/gcn/issues/125 https://github.com/tkipf/gcn/issues/114 https://github.com/tkipf/gcn/issues/36 https://github.com/tkipf/gcn/issues/22 https://github.com/tkipf/gae/issues/35

I'm planning to use node degree as recommended here https://github.com/tkipf/gcn/issues/22 (and add more features later on) My current attempt is to do

node_deg = dict(G.degree()).values() features = sparse.csr_matrix(node_deg).T

As I don't understand the Cora output, I can't really assess if that is correct or not.

Could you provide more guidance and explanation for that?

That would be great :) Thank you in advance, Best, Minh

Jul 22 '19 14:07 MinhAnhL

I recommend using the data loader from https://github.com/tkipf/keras-gcn/blob/master/kegra/utils.py

Then you don’t have to deal with this strange .allx etc format (which is just supplied in this repo because it was used in a benchmark from an earlier paper from some other lab on which we base our evaluation on) :)

Jul 23 '19 07:07 tkipf

i have the same question about how to apply it to my own dataset, so if you have solved this problem , could you give me some guidance please, thanks a lot !

Oct 26 '20 12:10 ZJJTSL

Dear @tkipf first of all, thank you for the excellent work! Your paper and the provided code are helpful to get started with GCN.

I'm currently trying to apply your algorithm to my data. After looking at the load_data() function, I was able to create an adjacency matrix in the same format as your Cora example.

However, I struggle with the node feature object, because I don't understand the meaning of the content cora.tx and cora.allx. The shape is (2709x1433) (#41 nodes x #features), so apparently, there are 1433 node features. The print is as follows

(0, 19) 1.0 (0, 81) 1.0 (0, 146) 1.0 (0, 315) 1.0 (0, 774) 1.0 (0, 877) 1.0 (0, 1194) 1.0 (0, 1247) 1.0 (0, 1274) 1.0 (1, 19) 1.0 (1, 88) 1.0 (1, 149) 1.0 (1, 212) 1.0 (1, 233) 1.0

How can we interpret the rows? I can't make sense of it. This format is usually an edge list format, but as we have node features, how do the edges come into play?

I looked into your other repositories and issues around this topic and couldn't find anything which helps me understand the structure of the .tx and .allx files.

tkipf/gcn#36 tkipf/gcn#125 tkipf/gcn#114 tkipf/gcn#36 tkipf/gcn#22 #35

I'm planning to use node degree as recommended here tkipf/gcn#22 (and add more features later on) My current attempt is to do

node_deg = dict(G.degree()).values() features = sparse.csr_matrix(node_deg).T

As I don't understand the Cora output, I can't really assess if that is correct or not.

Could you provide more guidance and explanation for that?

That would be great :) Thank you in advance, Best, Minh hi ,have you ever tried the degree matrix as the feature matrix, if so ,the dimention of feature matrix is n*1, does it work in your case?

Nov 30 '20 07:11 ZJJTSL