clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Model config serialization fails with Keras Lambda layer on TF 2.5

Open bzamecnik opened this issue 2 years ago • 0 comments

When training a Keras model, upon upgrade to TF 2.5 tracking to ClearML (even with latest 1.3.1) started raising a warning:

clearml - WARNING - ('Could not serialize configuration dictionary:\n' ....

In particular the config contains a lambda function (K.mean) and ClearML can't serialize it neither to Hocon, nor to JSON. Eg. part of the model's config:

{'class_name': 'Lambda',
   'config': {'name': 'row_projection',
    'trainable': True,
    'dtype': 'float32',
    'function': ('4wAAAAAAAAAAAwAAAAwAAAAfAAAAc0QAAAB5CogAfAB8AY4BUwAEAHQAdAFmAmsKcj4BAAEAAQB0\nAogBfAB8AYMDfQJ8AnQDagRrCXI4fAJTAIIAWQBuAlgAZAFTACkCekJDYWxsIHRhcmdldCwgYW5k\nIGZhbGwgYmFjayBvbiBkaXNwYXRjaGVycyBpZiB0aGVyZSBpcyBhIFR5cGVFcnJvci5OKQXaCVR5\ncGVFcnJvctoKVmFsdWVFcnJvctoIZGlzcGF0Y2jaDE9wRGlzcGF0Y2hlctoNTk9UX1NVUFBPUlRF\nRCkD2gRhcmdz2gZrd2FyZ3PaBnJlc3VsdCkC2gZ0YXJnZXTaB3dyYXBwZXKpAPpJL3Vzci9sb2Nh\nbC9saWIvcHl0aG9uMy42L2Rpc3QtcGFja2FnZXMvdGVuc29yZmxvdy9weXRob24vdXRpbC9kaXNw\nYXRjaC5weXIKAAAAywAAAHMOAAAAAAICAQoBEgMMAQoBBAI=\n',
     None,
     (<function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>,
      <function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>)),
    'function_type': 'lambda',
    'module': 'tensorflow.python.keras.backend',
    'output_shape': None,
    'output_shape_type': 'raw',
    'output_shape_module': None,
    'arguments': {'axis': -2}},
   'name': 'row_projection',
   'inbound_nodes': [[['conv2d_7', 0, 0, {}]]]},

Prior to that there was just the functions's name "mean" instead of the object: <function tensorflow.python.keras.backend.mean(x, axis=None, keepdims=False)>.

There's a Lambda function within the model:

Lambda(K.mean, arguments={"axis": -2}, name="row_projection")

I found that the model config comes from Model._updated_config() which wraps Model.get_config() with TF/Keras version. It may contain various non-serializable objects.

Besides that Keras Model has function Model.to_json() which uses an explicit serializer to those objects but ClearML does not use it.

# tensorflow/python/keras/engine/training.py
class Model
  def to_json(self, **kwargs):
    model_config = self._updated_config()
    return json.dumps(
        model_config, default=serialization.get_json_type, **kwargs)

The relevant code in ClearML where this sanitization should be done seems to be https://github.com/allegroai/clearml/blob/774957797e9f6c842b59ad4c6fb1cb91f9c55a06/clearml/binding/frameworks/tensorflow_bind.py#L1745.

The sanitization seems it has to be fixed on two places:

# called when training starts
# class PatchKerasModelIO
import json
from tensorflow.python.util.serialization import get_json_type

    @staticmethod
    def _updated_config(original_fn, self):
        config = original_fn(self)
        # check if we have main task
        if PatchKerasModelIO.__main_task is None:
            return config

        try:
            safe_config = json.loads(json.dumps(config, default=get_json_type))
            # pass safe_config instead of config around, return config
# called when training finishes
# class PatchKerasModelIO
import json
from tensorflow.python.util.serialization import get_json_type

try:
    unsafe_config = self._updated_config()
    config = json.loads(json.dumps(unsafe_config, default=get_json_type)) # <--- added code
except Exception:
    # we failed to convert the network to json, for some reason (most likely internal keras error)
    config = {}

This way it didn't crash but the model config still didn't appear in ClearML UI.

bzamecnik avatar Mar 22 '22 09:03 bzamecnik