molgraph
molgraph copied to clipboard
Cannot save model as specified in the documentation
The documentation proposed to either save models using tf.saved_model or tf.keras. The first approach works, but saves a model that cannot be trained further (to my understanding). The second approach crashes with the following error:
Traceback (most recent call last):
File "/home/raphael/.../example.py", line 86, in <module>
model.save("model0.keras")
File "/home/raphael/.../python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/raphael/.../python3.10/site-packages/keras/src/saving/serialization_lib.py", line 395, in _get_class_or_fn_config
raise TypeError(
TypeError: Cannot serialize object ImmutableDict({}) of type <class 'tensorflow.python.framework.immutable_dict.ImmutableDict'>. To be serializable, a class must implement the `get_config()` method.
Here is an example based on the GAT tutorial that reproduces the error above:
from molgraph.chemistry import datasets
from molgraph.chemistry import features
from molgraph.chemistry import Featurizer
from molgraph.chemistry import MolecularGraphEncoder
from tensorflow import keras
import tensorflow as tf
atom_encoder = Featurizer([
features.Symbol(),
features.Hybridization(),
])
bond_encoder = Featurizer([
features.BondType(),
features.Conjugated(),
])
encoder = MolecularGraphEncoder(
atom_encoder,
bond_encoder,
positional_encoding_dim=16,
self_loops=False
)
esol = datasets.get('esol')
x_train = encoder(esol['train']['x'])
y_train = esol['train']['y']
x_val = encoder(esol['validation']['x'])
y_val = esol['validation']['y']
x_test = encoder(esol['test']['x'])
y_test = esol['test']['y']
type_spec = x_train.spec
from molgraph.layers import GATConv
from molgraph.layers import LaplacianPositionalEncoding
from molgraph.layers import Readout
from molgraph.layers import MinMaxScaling
node_preprocessing = MinMaxScaling(
feature='node_feature', feature_range=(0, 1), threshold=True)
edge_preprocessing = MinMaxScaling(
feature='edge_feature', feature_range=(0, 1), threshold=True)
train_ds = (
tf.data.Dataset.from_tensor_slices((x_train, y_train))
.shuffle(1024)
.batch(32)
.prefetch(-1)
)
val_ds = (
tf.data.Dataset.from_tensor_slices((x_val, y_val))
.batch(32)
.prefetch(-1)
)
test_ds = (
tf.data.Dataset.from_tensor_slices((x_test, y_test))
.batch(32)
.prefetch(-1)
)
node_preprocessing.adapt(train_ds.map(lambda x, *args: x))
edge_preprocessing.adapt(train_ds.map(lambda x, *args: x))
model = keras.Sequential([
keras.layers.Input(type_spec=type_spec),
node_preprocessing,
edge_preprocessing,
LaplacianPositionalEncoding(),
GATConv(normalization='batch_norm'),
GATConv(normalization='batch_norm'),
GATConv(normalization='batch_norm'),
Readout(),
keras.layers.Dense(1024, 'relu'),
keras.layers.Dense(1024, 'relu'),
keras.layers.Dense(y_train.shape[-1])
])
model.compile(optimizer='adam', loss='mae')
model.predict(x_test)
model.save("model0.keras")
Tensorflow version: 2.15.1
Keras version: 2.15.0
Python version: 3.10.12
Molgraph version: 0.6.6 (10143c6)
Thanks for the observation @RaphaelRobidas, I will check it out.
It seems to be an issue related to the ExtensionType API of TF. The 'auxiliary' field of the GraphTensor is a Mapping[str, tf.Tensor] type which internally creates a ImmutableMapping, which cannot be serialized (using .keras. format).
So the current solution is to switch from the .keras format to the SavedModel format by omitting .keras.
@akensert Thanks for looking into this.
Your solution does solve the issue in the example perfectly, thanks a lot! It seems to be problematic for some kinds of layers though. With a model using GATv2Conv layers, I guess the following error:
Traceback (most recent call last):
File "/home/raphael/<path>/gat.py", line 415, in <module>
gnn_model2 = keras.models.load_model("model0")
File "/home/<path>/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model
return legacy_sm_saving_lib.load_model(
File "/home/<path>/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filecz6jiapd.py", line 37, in tf__call
ag__.if_stmt(ag__.converted_call(ag__.ld(graph_tensor).is_ragged, (), None, fscope), if_body, else_body, get_state, set_state, ('graph_tensor',), 1)
AttributeError: Exception encountered when calling layer 'gat_conv_1' (type GATv2Conv).
in user code:
File "/home/raphael/<path>/molgraph/molgraph/layers/gnn_layer.py", line 205, in call *
if graph_tensor.is_ragged():
AttributeError: 'SymbolicTensor' object has no attribute 'is_ragged'
Call arguments received by layer 'gat_conv_1' (type GATv2Conv):
• graph_tensor=tf.Tensor(shape=(None, None, 114), dtype=float32)
Actually, that's not quite true. The problem occurs when the input specifications are not defined explicitly in the model via keras.layers.Input(type_spec=X_train.spec). Adding this seems to solve the issue as far as I can tell!
@RaphaelRobidas Okay interesting, good to know. And good to know you could save your models eventually, although not in .keras format.
Btw, I would like to migrate to Keras 3 (and TF>=2.16); however, Keras 3 does not yet support extension types.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
It seems to be an issue related to the ExtensionType API of TF. The 'auxiliary' field of the GraphTensor is a
Mapping[str, tf.Tensor]type which internally creates aImmutableMapping, which cannot be serialized (using.keras.format).So the current solution is to switch from the
.kerasformat to theSavedModelformat by omitting.keras.
@RaphaelRobidas: Quite late for a fix, but I've now implemented a (temporary) fix for this (see version 0.6.8). You should now be able to save a model using .keras format:
import keras
from molgraph import GraphTensor
from molgraph import layers
from tensorflow import keras
g = GraphTensor(node_feature=[[4.], [2.]], edge_src=[0], edge_dst=[1])
model = keras.Sequential([
layers.GNNInput(type_spec=g.spec), # or layers.GNNInputLayer(type_spec)
layers.GINConv(units=32),
layers.GINConv(units=32),
layers.Readout(),
keras.layers.Dense(units=1),
])
pred = model(g)
model.save('/tmp/tmp_model.keras')
loaded_model = keras.models.load_model('/tmp/tmp_model.keras')
assert pred == loaded_model(g)
loaded_model.summary()