spektral icon indicating copy to clipboard operation
spektral copied to clipboard

Using GeneralGNN with custom data

Open ash22194 opened this issue 3 years ago • 3 comments

I am trying to implement a neural network model to predict the number of nodes in a graph that satisfy some specific property indicated by the node features and features of the node neighbours. The network has to predict only a single scalar value for a graph and all the graphs in my dataset have the same number of nodes. The different graphs in the dataset differ in the adjacency matrix and the node features (which are scalars). I am using a BatchLoader to load my dataset as it seemed the most appropriate given the type of data I have. I am trying to use the GeneralGNN model to train on this dataset. As per the documentation, GeneralGNN requires a sparse adjacency matrix therefore I used scipy.sparse.csr_matrix to store the adjacency matrices for the Graph objects in the dataset. The BatchLoader however converts these sparse adjacency matrices to dense while loading a batch and thus crashes when running model.fit(loader.load()). I tried to train the model without using a loader, i.e. by calling model.fit(x=train_examples, y=train_labels), where train_examples is a tuple of numpy arrays. train_examples[0] i.e. the collocated node features have dimension (num_examples,num_nodes,num_features) whereas train_examples[1] has dimension (num_examples, 1) where every entry in train_examples[1] is of the type scipy.sparse.csr_matrix. This still crashes with the error Dimensions are incompatible. Any help in figuring out how to train a GeneralGNN model would be greatly appreciated.

ash22194 avatar Feb 04 '21 19:02 ash22194

Hi,

you are correct in your analysis of how to structure data, there is a missing piece of information in the documentation which I should have included: GeneralGNN uses GeneralConv, which only supports single, mixed and disjoint mode.

In your case, it should be sufficient to simply use a DisjointLoader instead of BatchLoader.

I will update the docs with this info.

Cheers

danielegrattarola avatar Feb 05 '21 08:02 danielegrattarola

I still am not sure how to pass the data using the DisjointLoader. The load function for this loader generates 4 outputs the collated feature matrix, the disjoint adjacency matrix, and an edge weight matrix. The graphs in my dataset do not have any edge weights and as such the e attribute in my graph objects is None. Yet, when loading data using the DisjointLoader it generates some edge weight matrix, and thus the training loop described in the generall_gnn.py example crashes for my dataset. If you would like to pass a batch of datapoints to the fit function without using a loader how would you do it? Would you have to construct a big graph from the batch of datapoints similar to what the Disjoint Loader does or is there a way to pass a list/array of graphs?

ash22194 avatar Feb 07 '21 23:02 ash22194

If the graphs in your dataset do not have edge attributes, then the loader should not return the edge attributes matrix. If you specified that e=None when creating your graphs, then there might be a problem with the loader.

Can you post a minimal example to reproduce the crash?

Can you inspect the batch variable at this line and make sure that batch[0] is a list of 4 matrices and not just 3?

To answer you other question: if you want to create your own batches without using a loader, you can use spektral.data.utils.to_disjoint, which takes lists of adjacency matrices, node features, etc, and returns the collated matrices.

Cheers

danielegrattarola avatar Feb 08 '21 08:02 danielegrattarola