keras icon indicating copy to clipboard operation
keras copied to clipboard

Model Loading Fails

Open sunweilunformatech opened this issue 3 years ago • 10 comments
trafficstars

Model loading fails with custom layer using hdf5_format lib. It works with keras.models.load_model or with version TF 2.5.x, but starts failing from TF 2.6+. See log for details.

Traceback (most recent call last):
  File "model_save.py", line 36, in <module>
    saved_model = hdf5_format.load_model_from_hdf5(
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 180, in load_model_from_hdf5
    model = model_config_lib.model_from_config(model_config,
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/saving/model_config.py", line 52, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/layers/serialization.py", line 163, in deserialize
    return generic_utils.deserialize_keras_object(
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 667, in from_config
    input_tensors, output_tensors, created_layers = reconstruct_from_config(
  File "/opt/conda/envs/iris/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 1310, in reconstruct_from_config
    layer_output_tensors = layer._inbound_nodes[node_index].output_tensors
IndexError: list index out of range

Example standalone code:

from tensorflow import keras
import h5py
from tensorflow.python.keras.saving import hdf5_format


class CustomLayer(keras.layers.Layer):
    """combine multiple activations weighted by learnable variables"""
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def get_config(self):
        return {}

    def build(self, input_shape):
        return

    def call(self, inputs):
        return inputs
    
path = 'test.h5'
    
x = keras.Input((5))
y = CustomLayer()(x)
model = keras.Model(x, y)
model.build(x)
model.save(path)

# this works ok
custom_objects = {'CustomLayer': CustomLayer}
model = keras.models.load_model(path, custom_objects=custom_objects)

# this fails
with h5py.File('test.h5', mode='r') as f:
    saved_model = hdf5_format.load_model_from_hdf5(
        f, custom_objects=custom_objects)

sunweilunformatech avatar Jun 15 '22 11:06 sunweilunformatech

@gowthamkpr, I was able to reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

tilakrayal avatar Jun 16 '22 09:06 tilakrayal

@gowthamkpr Any follow up with this?

sunweilunformatech avatar Jun 30 '22 11:06 sunweilunformatech

@sunweilunformatech Thanks for the issue! However, the funciton you use is a private API, which is not supposed to be called directly by the user. So we are unable to fix this.

haifeng-jin avatar Jul 07 '22 20:07 haifeng-jin

@haifeng-jin The main reason I'm calling this is because I need to store some additional metadata along with the model in the h5 file. I find this API pretty neat until it broke. What should we do to store metadata in this case? A commonly used metadata is # of epochs for example. It's possible to store them in separate files, but it makes model storage less compact.

sunweilunformatech avatar Jul 07 '22 20:07 sunweilunformatech

@k-w-w Any thoughts on this?

haifeng-jin avatar Jul 07 '22 21:07 haifeng-jin

Tagging @rchao

k-w-w avatar Jul 07 '22 22:07 k-w-w

Hello @sunweilunformatech, are you able to build locally? One solution to identify where the regression happened is to check out TensorFlow at different commits, and see at which commit the workflow started breaking down.

rchao avatar Jul 11 '22 20:07 rchao

It's between 2.5.3 and 2.6.0. I haven't got time to narrow it down to a specific commit. But I guess we could do a binary search.

sunweilunformatech avatar Jul 13 '22 14:07 sunweilunformatech

@rchao So 2.5.3 works but not 2.6.0. Must be some commit in between.

sunweilunformatech avatar Jul 13 '22 14:07 sunweilunformatech

Thanks for the info. Yes, if you could perform a binary search on the commits it would greatly help us.

rchao avatar Jul 13 '22 23:07 rchao