keras icon indicating copy to clipboard operation
keras copied to clipboard

tf.keras.callbacks.ModelCheckpoint Type Error : Unable to serialize 1.0000000656873453e-05 to JSON

Open ravinderkhatri opened this issue 4 years ago • 13 comments

I am creating my custom layers tf.keras model using the mobile net pre-trained layer. Model training is running fine but when saving the best-picked model it is giving an error. Below is the snippet of the code that I used

pretrained_model = tf.keras.applications.MobileNetV2(
                                                    weights='imagenet',
                                                    include_top=False,
                                                    input_shape=[*IMAGE_SIZE, IMG_CHANNELS])
pretrained_model.trainable = True #fine tuning
model = tf.keras.Sequential([
                            tf.keras.layers.Lambda(# Convert image from int[0, 255] to the format expect by this model
                            lambda data:tf.keras.applications.mobilenet.preprocess_input(
                                tf.cast(data, tf.float32)), input_shape=[*IMAGE_SIZE, 3]),
                            pretrained_model,
                            tf.keras.layers.GlobalAveragePooling2D()])

model.add(tf.keras.layers.Dense(64, name='object_dense',kernel_regularizer=tf.keras.regularizers.l2(l2=0.001)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_64'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_64'))
model.add(tf.keras.layers.Dense(32, name='object_dense_2',kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_32'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_32'))
model.add(tf.keras.layers.Dense(16, name='object_dense_16', kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='object_prob'))
m1 = tf.keras.metrics.CategoricalAccuracy()
m2 = tf.keras.metrics.Recall()
m3 = tf.keras.metrics.Precision()



optimizers = [
    tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
    tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
           ]

optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]

optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)

model.compile(
    optimizer= optimizer,
    loss = 'categorical_crossentropy',
    metrics=[m1, m2, m3],
    )

checkpoint_path = os.getcwd() + os.sep + 'keras_model'
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(checkpoint_path), 
                                                    monitor = 'categorical_accuracy',
                                                    save_best_only=True,
                                                    save_weights_only=False)

history = model.fit(train_data, validation_data=test_data, epochs=N_EPOCHS, callbacks=[checkpoint_cb])

At tf.keras.callbacks.ModelCheckpoint is giving me an error

TypeError: Unable to serialize 1.0000000656873453e-05 to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.

I am using tensorflow 2.7

ravinderkhatri avatar Nov 08 '21 10:11 ravinderkhatri

@ravinderkhatri, In order to reproduce the issue reported here, could you please provide the complete code? Thanks!

chunduriv avatar Nov 08 '21 13:11 chunduriv

Below is the code that I used. Please let me know in case you need further information

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
    strategy = tf.distribute.TPUStrategy(tpu)
except ValueError:
    strategy = tf.distribute.MirroredStrategy()


print('REPLICAS', strategy.num_replicas_in_sync)


img_dataset_path = 'images'

IMG_HEIGHT = 224
IMG_WIDTH = 224
IMAGE_SIZE = [IMG_HEIGHT, IMG_WIDTH]
IMG_CHANNELS = 3
BATCH_SIZE = 16

label_map_pbtxt_path = 'label_map.pbtxt'
# get  total image class
CLASS_NAMES = get_label_from_label_map_pbtxt(label_map_pbtxt_path)
n_labels = len(CLASS_NAMES)
# read tf record dataset
train_dataset_path = 'train.record'
test_dataset_path = 'test.record'
# changes in tf record dataset
train_data = create_dataset(train_dataset_path, batch_size=16, IMG_WIDTH=224, IMG_HEIGHT=224, n_labels=n_labels)
test_data = create_dataset(test_dataset_path, batch_size=16, IMG_WIDTH=224, IMG_HEIGHT=224, n_labels=n_labels)

N_EPOCHS = 200

step = tf.Variable(0, trainable=False)

initial_learning_rate = 0.001
schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=14,
    decay_rate=0.8,
    staircase=True)
lr = 1e-0 * schedule(step)
wd = lambda: 1e-2 * schedule(step)

with strategy.scope():
    pretrained_model = tf.keras.applications.MobileNetV2(
                                                    weights='imagenet',
                                                    include_top=False,
                                                    input_shape=[*IMAGE_SIZE, IMG_CHANNELS])
    pretrained_model.trainable = True #fine tuning
    model = tf.keras.Sequential([
                            tf.keras.layers.Lambda(# Convert image from int[0, 255] to the format expect by this model
                            lambda data:tf.keras.applications.mobilenet.preprocess_input(
                                tf.cast(data, tf.float32)), input_shape=[*IMAGE_SIZE, 3]),
                            pretrained_model,
                            tf.keras.layers.GlobalAveragePooling2D()])
    model.add(tf.keras.layers.Dense(64, name='object_dense',kernel_regularizer=tf.keras.regularizers.l2(l2=0.001)))
    model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
    model.add(tf.keras.layers.Activation('relu', name='relu_dense_64'))
    model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_64'))
    model.add(tf.keras.layers.Dense(32, name='object_dense_2',kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
    model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
    model.add(tf.keras.layers.Activation('relu', name='relu_dense_32'))
    model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_32'))
    model.add(tf.keras.layers.Dense(16, name='object_dense_16', kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
    model.add(tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='object_prob'))
    m1 = tf.keras.metrics.CategoricalAccuracy()
    m2 = tf.keras.metrics.Recall()
    m3 = tf.keras.metrics.Precision()



optimizers = [
    tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
    tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
           ]


optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]

optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)

model.compile(
    optimizer= optimizer,
    loss = 'categorical_crossentropy',
    metrics=[m1, m2, m3],
    )
checkpoint_path = os.getcwd() + os.sep + 'keras_model'

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, 
                                                    monitor = 'categorical_accuracy',
                                                    save_best_only=True,
                                                    save_weights_only=False)

history = model.fit(train_data, validation_data=test_data, epochs=N_EPOCHS, callbacks=[checkpoint_cb])

ravinderkhatri avatar Nov 08 '21 13:11 ravinderkhatri

@ravinderkhatri, Can you please share label_map.pbtxt and other supporting files if any to replicate above issue? Thanks!

chunduriv avatar Nov 10 '21 04:11 chunduriv

@chunduriv Sharing all the required files to replicate the above issue.

Below is the link to label_map.pbtxt file https://drive.google.com/file/d/1Qqk3jtJLwA-h3G3sgjMMtvr5JWmPJGcC/view?usp=sharing

Link to tfrecord training file https://drive.google.com/file/d/1WSuFYHiYKN5AXlRmtMfNhQ_gmfsnK-BE/view?usp=sharing

Link to tfrecord testing file https://drive.google.com/file/d/1NWOMQ9QnVA5txXwtWVPgZ4oT0Ul7fLyv/view?usp=sharing

Function to get labels from label_pbtxt file

def get_label_from_label_map_pbtxt(label_map_pbtxt_path):
    CLASS_NAMES = []
    with open(label_map_pbtxt_path, "r") as file:
        for line in file:
            if 'name:' in line:
                img_class_name = line.split('name:')[-1].replace("'", "").strip()
                CLASS_NAMES.append(img_class_name)
    return CLASS_NAMES

Function to read tfrecord file

def read_tfrecord(example):
    read_features = {'image/encoded' : tf.io.FixedLenFeature([], dtype = tf.string),
                    'image/object/class/label': tf.io.FixedLenFeature([], dtype = tf.int64)}

    parsed_s_example = tf.io.parse_single_example(serialized=example,
                                                features=read_features)
    encoded_image = parsed_s_example['image/encoded']
    decoded_image = tf.io.decode_image(encoded_image)
    image_label = parsed_s_example['image/object/class/label']
    return decoded_image, image_label


def read_tfrecord_parallelize(tfrecord_data_name):
    """
    Perform parallel reading of data from TFRecordDataset generated by using example_proto
    For more info see: to_gpt_tfrecord_example and get_data_and_write_gpt_tf_record_format
    It uses read_gpt_tfrecord function to read data.
    Args:
        tfrecord_data_name(str): name of tfrecord dataset to read
    Returns:
        dataset(tensorflow.python.data.ops.dataset_ops.ShuffleDataset): a shuffled dataset read from tfrecord_data_name 
    """
    dataset = tf.data.TFRecordDataset(tfrecord_data_name, num_parallel_reads=AUTO)
    dataset = dataset.with_options(option_no_order)
    dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTO)
    dataset = dataset.shuffle(300)
    return dataset

By using above function dataset can be called out as

train_data = read_tfrecord_parallelize(train_tfrecord_file_path)
test_data = read_tfrecord_parallelzie(test_tfrecord_file_path

Please let me know in case further information is required.

ravinderkhatri avatar Nov 11 '21 06:11 ravinderkhatri

@ravinderkhatri Linking this workaround for now. Can you please provide a colab gist for us to look into this issue. Thanks!

gowthamkpr avatar Jun 05 '22 04:06 gowthamkpr

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Jun 12 '22 04:06 google-ml-butler[bot]

@gowthamkpr Thank you for suggesting the workaround. Actually, I have created this SO and used that workaround. Below is the link to the colab notebook if you can to replicate the issue https://colab.research.google.com/drive/1wQbUFfhtDaB5Xta574UkAXJtthui7Bt9?usp=sharing

ravinderkhatri avatar Jun 12 '22 08:06 ravinderkhatri

Hello, looking at the colab, some stack trace is missing. To debug this further, can you try adding this call

tf.debugging.disable_traceback_filtering()

at the beginning of the program or unit test so we can find out more?

rchao avatar Jul 14 '22 07:07 rchao

@rchao Done. I have added tf.debugging.disable_traceback_filtering() at the beginning of the program

ravinderkhatri avatar Oct 01 '22 11:10 ravinderkhatri

I encountered same issue, does the issue already resolved? I also can provide my code and dataset if helpful for debug.

Some environment:

  • OS: Ubuntu 20.04
  • package
    tensorboard                  2.10.0
    tensorboard-data-server      0.6.1
    tensorboard-plugin-wit       1.8.1
    tensorflow                   2.10.0
    tensorflow-estimator         2.10.0
    tensorflow-gpu               2.10.0
    tensorflow-io-gcs-filesystem 0.27.0
    

chilin0525 avatar Oct 13 '22 14:10 chilin0525

Thanks both @ravinderkhatri and @chilin0525 - can you provide the full stack trace?

rchao avatar Oct 17 '22 06:10 rchao

Sure, but I have no time to reproduce the error recently. If I want to provide full stack trace just put following code at the beginning right?

tf.debugging.disable_traceback_filtering()

chilin0525 avatar Oct 20 '22 04:10 chilin0525

Correct.

rchao avatar Oct 20 '22 07:10 rchao

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base. After the migration is successfully completed, feel free to reopen this issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please take a look at https://keras.io/keras_core/announcement/. Thank you!

tilakrayal avatar Sep 22 '23 13:09 tilakrayal

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 22 '23 13:09 google-ml-butler[bot]