spektral icon indicating copy to clipboard operation
spektral copied to clipboard

Retrieve datafiles from singleloader function

Open StefanBloemheuvel opened this issue 3 years ago • 4 comments

Hi,

Thanks for creating this package, it is wonderful!

One question I have, is that I prefer to work with just local files that represent x_train, y_train etc. However, al lot of your examples use these singleloaders() which are great for running the code, but are hindering my experiments when I try to change some stuff around. Therefore, I would like to just provide keras myself the actual data that these singeloaders provide the model.fit() function. How could i achieve this?

The code that I use right now (using the citation example dataset):

import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Input, Dropout

from spektral.data.loaders import SingleLoader
from spektral.datasets.citation import Citation
from spektral.layers import GCNConv
from spektral.models.gcn import GCN
from spektral.transforms import LayerPreprocess

learning_rate = 1e-2
seed = 0
epochs = 200
patience = 10
data = "cora"
dropout = 0.5

tf.random.set_seed(seed=seed)  # make weight initialization reproducible

dataset = Citation(data, normalize_x=True, transforms=[LayerPreprocess(GCNConv)])

F = dataset.n_node_features
N = dataset.n_nodes

def mask_to_weights(mask):
    return mask.astype(np.float32) / np.count_nonzero(mask)

weights_tr, weights_va, weights_te = (
    mask_to_weights(mask)
    for mask in (dataset.mask_tr, dataset.mask_va, dataset.mask_te)
)

x_in = Input(shape=(F,))
a_in = Input((N,N), sparse=False)

do_1 = Dropout(dropout)(x_in)
gc_1 = GCNConv(16,
               activation='relu',
               kernel_regularizer=l2(l2_reg),
               use_bias=False)([do_1, a_in])
do_2 = Dropout(dropout)(gc_1)
gc_2 = GCNConv(7,
               activation='softmax',
               use_bias=False)([do_2, a_in])

# Build model
model = Model(inputs=[x_in, a_in], outputs=gc_2)
optimizer = Adam(lr=learning_rate)

model.compile(
    optimizer=Adam(learning_rate),
    loss=CategoricalCrossentropy(reduction="sum"),
    weighted_metrics=["acc"],
)
model.summary()

# Train model
loader_tr = SingleLoader(dataset, sample_weights=weights_tr)
loader_va = SingleLoader(dataset, sample_weights=weights_va)
model.fit(
    x=loader_tr.load(),
    steps_per_epoch=loader_tr.steps_per_epoch,
    validation_data=loader_va.load(),
    validation_steps=loader_va.steps_per_epoch,
    epochs=epochs,
    callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)

# Evaluate model
print("Evaluating model.")
loader_te = SingleLoader(dataset, sample_weights=weights_te)
eval_results = model.evaluate(loader_te.load(), steps=loader_te.steps_per_epoch)
print("Done.\n" "Test loss: {}\n" "Test accuracy: {}".format(*eval_results))

Thanks in advance if you are able to help out, and again I appreciate all the effort you put in this package!

Kind regards,

Stefan

StefanBloemheuvel avatar Feb 16 '22 14:02 StefanBloemheuvel

If you're in the single graph case, you can avoid using the loader and do:

model.fit(
    x=[features, adjacency],
    y=labels,
    batch_size=features.shape[-2],  # Number of nodes
    epochs=epochs,
    callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)

features can be a Numpy array, adjacency should be a SparseTensor.

Cheers

danielegrattarola avatar Feb 16 '22 16:02 danielegrattarola

Hi,

Thanks for your reply, but what I actually mean is how to retrieve these features, adjacency and label numpy arrays from:

dataset = Citation(data, normalize_x=True, transforms=[LayerPreprocess(GCNConv)])

Because if I do the following:

model.fit(
    x=[dataset[0].x, dataset[0].a],
    steps_per_epoch=loader_tr.steps_per_epoch,
    validation_data=dataset[0].y,
    validation_steps=loader_va.steps_per_epoch,
    epochs=epochs,
    callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)

I get the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\20191577\My Drive\Publicaties\LayerExtraction\newestcitation.py in <module>
     [74](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=73) loader_va = SingleLoader(dataset, sample_weights=weights_va)
     [75](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=74) # model.fit(
     [76](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=75) #     x=loader_tr.load(),
     [77](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=76) #     steps_per_epoch=loader_tr.steps_per_epoch,
   (...)
     [81](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=80) #     callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
     [82](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=81) # )
---> [84](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=83) model.fit(
     [85](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=84)     x=[dataset[0].x, dataset[0].a],
     [86](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=85)     steps_per_epoch=loader_tr.steps_per_epoch,
     [87](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=86)     validation_data=dataset[0].y,
     [88](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=87)     validation_steps=loader_va.steps_per_epoch,
     [89](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=88)     epochs=epochs,
     [90](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=89)     callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
     [91](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=90) )
     [93](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=92) # Evaluate model
     [94](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=93) print("Evaluating model.")

File ~\Miniconda3\envs\deeplearning\lib\site-packages\tensorflow\python\keras\engine\training.py:1117, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   [1110](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1109) if validation_split:
   [1111](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1110)   # Create the validation data using the training data. Only supported for
   [1112](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1111)   # `Tensor` and `NumPy` input.
   [1113](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1112)   (x, y, sample_weight), validation_data = (
   [1114](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1113)       data_adapter.train_validation_split(
   [1115](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1114)           (x, y, sample_weight), validation_split=validation_split))
-> [1117](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1116) if validation_data:
   [1118](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1117)   val_x, val_y, val_sample_weight = (
   [1119](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1118)       data_adapter.unpack_x_y_sample_weight(validation_data))
   [1121](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1120) if self.distribute_strategy._should_use_with_coordinator:  # pylint: disable=protected-access

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I see some info about (dataset.mask_tr, dataset.mask_va, dataset.mask_te), but those are only boolean arrays. I think something should be done with that info right?

StefanBloemheuvel avatar Feb 16 '22 16:02 StefanBloemheuvel

This is because you're not calling model.fit correctly. It should be:

features = dataset[0].x
adjacency = dataset[0].a
adjacency = spektral.utils.sp_matrix_to_sp_tensor(adjacency)
labels = dataset[0].y
n_nodes = features.shape[-2]

model.fit(
    x=[features, adjacency],
    y=labels,
    batch_size=N,
    sample_weights=weights_tr,
    validation_data=([features, adjacency], labels, weights_va),
    epochs=epochs,
    callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)

I've written this from memory, so there might be something to fix but you get the general idea.

danielegrattarola avatar Feb 16 '22 17:02 danielegrattarola

I will try getting it to work, it seems I have to play around with where to put the weights_tr and weights_va since tensorflow is giving the error that the sample_weights=.. is not a correct keyword for the fit function.

StefanBloemheuvel avatar Feb 17 '22 08:02 StefanBloemheuvel