clearml
clearml copied to clipboard
Model config serialization fails with Keras Lambda layer on TF 2.5
When training a Keras model, upon upgrade to TF 2.5 tracking to ClearML (even with latest 1.3.1) started raising a warning:
clearml - WARNING - ('Could not serialize configuration dictionary:\n' ....
In particular the config contains a lambda function (K.mean) and ClearML can't serialize it neither to Hocon, nor to JSON. Eg. part of the model's config:
{'class_name': 'Lambda',
'config': {'name': 'row_projection',
'trainable': True,
'dtype': 'float32',
'function': ('4wAAAAAAAAAAAwAAAAwAAAAfAAAAc0QAAAB5CogAfAB8AY4BUwAEAHQAdAFmAmsKcj4BAAEAAQB0\nAogBfAB8AYMDfQJ8AnQDagRrCXI4fAJTAIIAWQBuAlgAZAFTACkCekJDYWxsIHRhcmdldCwgYW5k\nIGZhbGwgYmFjayBvbiBkaXNwYXRjaGVycyBpZiB0aGVyZSBpcyBhIFR5cGVFcnJvci5OKQXaCVR5\ncGVFcnJvctoKVmFsdWVFcnJvctoIZGlzcGF0Y2jaDE9wRGlzcGF0Y2hlctoNTk9UX1NVUFBPUlRF\nRCkD2gRhcmdz2gZrd2FyZ3PaBnJlc3VsdCkC2gZ0YXJnZXTaB3dyYXBwZXKpAPpJL3Vzci9sb2Nh\nbC9saWIvcHl0aG9uMy42L2Rpc3QtcGFja2FnZXMvdGVuc29yZmxvdy9weXRob24vdXRpbC9kaXNw\nYXRjaC5weXIKAAAAywAAAHMOAAAAAAICAQoBEgMMAQoBBAI=\n',
None,
(<function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>,
<function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>)),
'function_type': 'lambda',
'module': 'tensorflow.python.keras.backend',
'output_shape': None,
'output_shape_type': 'raw',
'output_shape_module': None,
'arguments': {'axis': -2}},
'name': 'row_projection',
'inbound_nodes': [[['conv2d_7', 0, 0, {}]]]},
Prior to that there was just the functions's name "mean"
instead of the object: <function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>
.
There's a Lambda function within the model:
Lambda(K.mean, arguments={"axis": -2}, name="row_projection")
I found that the model config comes from Model._updated_config() which wraps Model.get_config() with TF/Keras version. It may contain various non-serializable objects.
Besides that Keras Model has function Model.to_json() which uses an explicit serializer to those objects but ClearML does not use it.
# tensorflow/python/keras/engine/training.py
class Model
def to_json(self, **kwargs):
model_config = self._updated_config()
return json.dumps(
model_config, default=serialization.get_json_type, **kwargs)
The relevant code in ClearML where this sanitization should be done seems to be https://github.com/allegroai/clearml/blob/774957797e9f6c842b59ad4c6fb1cb91f9c55a06/clearml/binding/frameworks/tensorflow_bind.py#L1745.
The sanitization seems it has to be fixed on two places:
# called when training starts
# class PatchKerasModelIO
import json
from tensorflow.python.util.serialization import get_json_type
@staticmethod
def _updated_config(original_fn, self):
config = original_fn(self)
# check if we have main task
if PatchKerasModelIO.__main_task is None:
return config
try:
safe_config = json.loads(json.dumps(config, default=get_json_type))
# pass safe_config instead of config around, return config
# called when training finishes
# class PatchKerasModelIO
import json
from tensorflow.python.util.serialization import get_json_type
try:
unsafe_config = self._updated_config()
config = json.loads(json.dumps(unsafe_config, default=get_json_type)) # <--- added code
except Exception:
# we failed to convert the network to json, for some reason (most likely internal keras error)
config = {}
This way it didn't crash but the model config still didn't appear in ClearML UI.