spektral
spektral copied to clipboard
Retrieve datafiles from singleloader function
Hi,
Thanks for creating this package, it is wonderful!
One question I have, is that I prefer to work with just local files that represent x_train, y_train etc. However, al lot of your examples use these singleloaders() which are great for running the code, but are hindering my experiments when I try to change some stuff around. Therefore, I would like to just provide keras myself the actual data that these singeloaders provide the model.fit() function. How could i achieve this?
The code that I use right now (using the citation example dataset):
import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Input, Dropout
from spektral.data.loaders import SingleLoader
from spektral.datasets.citation import Citation
from spektral.layers import GCNConv
from spektral.models.gcn import GCN
from spektral.transforms import LayerPreprocess
learning_rate = 1e-2
seed = 0
epochs = 200
patience = 10
data = "cora"
dropout = 0.5
tf.random.set_seed(seed=seed) # make weight initialization reproducible
dataset = Citation(data, normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
F = dataset.n_node_features
N = dataset.n_nodes
def mask_to_weights(mask):
return mask.astype(np.float32) / np.count_nonzero(mask)
weights_tr, weights_va, weights_te = (
mask_to_weights(mask)
for mask in (dataset.mask_tr, dataset.mask_va, dataset.mask_te)
)
x_in = Input(shape=(F,))
a_in = Input((N,N), sparse=False)
do_1 = Dropout(dropout)(x_in)
gc_1 = GCNConv(16,
activation='relu',
kernel_regularizer=l2(l2_reg),
use_bias=False)([do_1, a_in])
do_2 = Dropout(dropout)(gc_1)
gc_2 = GCNConv(7,
activation='softmax',
use_bias=False)([do_2, a_in])
# Build model
model = Model(inputs=[x_in, a_in], outputs=gc_2)
optimizer = Adam(lr=learning_rate)
model.compile(
optimizer=Adam(learning_rate),
loss=CategoricalCrossentropy(reduction="sum"),
weighted_metrics=["acc"],
)
model.summary()
# Train model
loader_tr = SingleLoader(dataset, sample_weights=weights_tr)
loader_va = SingleLoader(dataset, sample_weights=weights_va)
model.fit(
x=loader_tr.load(),
steps_per_epoch=loader_tr.steps_per_epoch,
validation_data=loader_va.load(),
validation_steps=loader_va.steps_per_epoch,
epochs=epochs,
callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)
# Evaluate model
print("Evaluating model.")
loader_te = SingleLoader(dataset, sample_weights=weights_te)
eval_results = model.evaluate(loader_te.load(), steps=loader_te.steps_per_epoch)
print("Done.\n" "Test loss: {}\n" "Test accuracy: {}".format(*eval_results))
Thanks in advance if you are able to help out, and again I appreciate all the effort you put in this package!
Kind regards,
Stefan
If you're in the single graph case, you can avoid using the loader and do:
model.fit(
x=[features, adjacency],
y=labels,
batch_size=features.shape[-2], # Number of nodes
epochs=epochs,
callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)
features
can be a Numpy array, adjacency
should be a SparseTensor
.
Cheers
Hi,
Thanks for your reply, but what I actually mean is how to retrieve these features, adjacency and label numpy arrays from:
dataset = Citation(data, normalize_x=True, transforms=[LayerPreprocess(GCNConv)])
Because if I do the following:
model.fit(
x=[dataset[0].x, dataset[0].a],
steps_per_epoch=loader_tr.steps_per_epoch,
validation_data=dataset[0].y,
validation_steps=loader_va.steps_per_epoch,
epochs=epochs,
callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)
I get the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\Users\20191577\My Drive\Publicaties\LayerExtraction\newestcitation.py in <module>
[74](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=73) loader_va = SingleLoader(dataset, sample_weights=weights_va)
[75](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=74) # model.fit(
[76](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=75) # x=loader_tr.load(),
[77](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=76) # steps_per_epoch=loader_tr.steps_per_epoch,
(...)
[81](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=80) # callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
[82](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=81) # )
---> [84](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=83) model.fit(
[85](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=84) x=[dataset[0].x, dataset[0].a],
[86](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=85) steps_per_epoch=loader_tr.steps_per_epoch,
[87](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=86) validation_data=dataset[0].y,
[88](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=87) validation_steps=loader_va.steps_per_epoch,
[89](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=88) epochs=epochs,
[90](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=89) callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
[91](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=90) )
[93](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=92) # Evaluate model
[94](file:///c%3A/Users/20191577/My%20Drive/Publicaties/LayerExtraction/newestcitation.py?line=93) print("Evaluating model.")
File ~\Miniconda3\envs\deeplearning\lib\site-packages\tensorflow\python\keras\engine\training.py:1117, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
[1110](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1109) if validation_split:
[1111](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1110) # Create the validation data using the training data. Only supported for
[1112](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1111) # `Tensor` and `NumPy` input.
[1113](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1112) (x, y, sample_weight), validation_data = (
[1114](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1113) data_adapter.train_validation_split(
[1115](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1114) (x, y, sample_weight), validation_split=validation_split))
-> [1117](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1116) if validation_data:
[1118](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1117) val_x, val_y, val_sample_weight = (
[1119](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1118) data_adapter.unpack_x_y_sample_weight(validation_data))
[1121](file:///~/Miniconda3/envs/deeplearning/lib/site-packages/tensorflow/python/keras/engine/training.py?line=1120) if self.distribute_strategy._should_use_with_coordinator: # pylint: disable=protected-access
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I see some info about (dataset.mask_tr, dataset.mask_va, dataset.mask_te), but those are only boolean arrays. I think something should be done with that info right?
This is because you're not calling model.fit
correctly.
It should be:
features = dataset[0].x
adjacency = dataset[0].a
adjacency = spektral.utils.sp_matrix_to_sp_tensor(adjacency)
labels = dataset[0].y
n_nodes = features.shape[-2]
model.fit(
x=[features, adjacency],
y=labels,
batch_size=N,
sample_weights=weights_tr,
validation_data=([features, adjacency], labels, weights_va),
epochs=epochs,
callbacks=[EarlyStopping(patience=patience, restore_best_weights=True)],
)
I've written this from memory, so there might be something to fix but you get the general idea.
I will try getting it to work, it seems I have to play around with where to put the weights_tr and weights_va since tensorflow is giving the error that the sample_weights=.. is not a correct keyword for the fit function.