tf.keras.callbacks.ModelCheckpoint Type Error : Unable to serialize 1.0000000656873453e-05 to JSON
I am creating my custom layers tf.keras model using the mobile net pre-trained layer. Model training is running fine but when saving the best-picked model it is giving an error. Below is the snippet of the code that I used
pretrained_model = tf.keras.applications.MobileNetV2(
weights='imagenet',
include_top=False,
input_shape=[*IMAGE_SIZE, IMG_CHANNELS])
pretrained_model.trainable = True #fine tuning
model = tf.keras.Sequential([
tf.keras.layers.Lambda(# Convert image from int[0, 255] to the format expect by this model
lambda data:tf.keras.applications.mobilenet.preprocess_input(
tf.cast(data, tf.float32)), input_shape=[*IMAGE_SIZE, 3]),
pretrained_model,
tf.keras.layers.GlobalAveragePooling2D()])
model.add(tf.keras.layers.Dense(64, name='object_dense',kernel_regularizer=tf.keras.regularizers.l2(l2=0.001)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_64'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_64'))
model.add(tf.keras.layers.Dense(32, name='object_dense_2',kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_32'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_32'))
model.add(tf.keras.layers.Dense(16, name='object_dense_16', kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='object_prob'))
m1 = tf.keras.metrics.CategoricalAccuracy()
m2 = tf.keras.metrics.Recall()
m3 = tf.keras.metrics.Precision()
optimizers = [
tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(
optimizer= optimizer,
loss = 'categorical_crossentropy',
metrics=[m1, m2, m3],
)
checkpoint_path = os.getcwd() + os.sep + 'keras_model'
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=os.path.join(checkpoint_path),
monitor = 'categorical_accuracy',
save_best_only=True,
save_weights_only=False)
history = model.fit(train_data, validation_data=test_data, epochs=N_EPOCHS, callbacks=[checkpoint_cb])
At tf.keras.callbacks.ModelCheckpoint is giving me an error
TypeError: Unable to serialize 1.0000000656873453e-05 to JSON. Unrecognized type <class 'tensorflow.python.framework.ops.EagerTensor'>.
I am using tensorflow 2.7
@ravinderkhatri, In order to reproduce the issue reported here, could you please provide the complete code? Thanks!
Below is the code that I used. Please let me know in case you need further information
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
strategy = tf.distribute.TPUStrategy(tpu)
except ValueError:
strategy = tf.distribute.MirroredStrategy()
print('REPLICAS', strategy.num_replicas_in_sync)
img_dataset_path = 'images'
IMG_HEIGHT = 224
IMG_WIDTH = 224
IMAGE_SIZE = [IMG_HEIGHT, IMG_WIDTH]
IMG_CHANNELS = 3
BATCH_SIZE = 16
label_map_pbtxt_path = 'label_map.pbtxt'
# get total image class
CLASS_NAMES = get_label_from_label_map_pbtxt(label_map_pbtxt_path)
n_labels = len(CLASS_NAMES)
# read tf record dataset
train_dataset_path = 'train.record'
test_dataset_path = 'test.record'
# changes in tf record dataset
train_data = create_dataset(train_dataset_path, batch_size=16, IMG_WIDTH=224, IMG_HEIGHT=224, n_labels=n_labels)
test_data = create_dataset(test_dataset_path, batch_size=16, IMG_WIDTH=224, IMG_HEIGHT=224, n_labels=n_labels)
N_EPOCHS = 200
step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.001
schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=14,
decay_rate=0.8,
staircase=True)
lr = 1e-0 * schedule(step)
wd = lambda: 1e-2 * schedule(step)
with strategy.scope():
pretrained_model = tf.keras.applications.MobileNetV2(
weights='imagenet',
include_top=False,
input_shape=[*IMAGE_SIZE, IMG_CHANNELS])
pretrained_model.trainable = True #fine tuning
model = tf.keras.Sequential([
tf.keras.layers.Lambda(# Convert image from int[0, 255] to the format expect by this model
lambda data:tf.keras.applications.mobilenet.preprocess_input(
tf.cast(data, tf.float32)), input_shape=[*IMAGE_SIZE, 3]),
pretrained_model,
tf.keras.layers.GlobalAveragePooling2D()])
model.add(tf.keras.layers.Dense(64, name='object_dense',kernel_regularizer=tf.keras.regularizers.l2(l2=0.001)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_64'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_64'))
model.add(tf.keras.layers.Dense(32, name='object_dense_2',kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.BatchNormalization(scale=False, center = False))
model.add(tf.keras.layers.Activation('relu', name='relu_dense_32'))
model.add(tf.keras.layers.Dropout(rate=0.2, name='dropout_dense_32'))
model.add(tf.keras.layers.Dense(16, name='object_dense_16', kernel_regularizer=tf.keras.regularizers.l2(l2=0.01)))
model.add(tf.keras.layers.Dense(len(CLASS_NAMES), activation='softmax', name='object_prob'))
m1 = tf.keras.metrics.CategoricalAccuracy()
m2 = tf.keras.metrics.Recall()
m3 = tf.keras.metrics.Precision()
optimizers = [
tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)
]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(
optimizer= optimizer,
loss = 'categorical_crossentropy',
metrics=[m1, m2, m3],
)
checkpoint_path = os.getcwd() + os.sep + 'keras_model'
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
monitor = 'categorical_accuracy',
save_best_only=True,
save_weights_only=False)
history = model.fit(train_data, validation_data=test_data, epochs=N_EPOCHS, callbacks=[checkpoint_cb])
@ravinderkhatri, Can you please share label_map.pbtxt and other supporting files if any to replicate above issue? Thanks!
@chunduriv Sharing all the required files to replicate the above issue.
Below is the link to label_map.pbtxt file https://drive.google.com/file/d/1Qqk3jtJLwA-h3G3sgjMMtvr5JWmPJGcC/view?usp=sharing
Link to tfrecord training file https://drive.google.com/file/d/1WSuFYHiYKN5AXlRmtMfNhQ_gmfsnK-BE/view?usp=sharing
Link to tfrecord testing file https://drive.google.com/file/d/1NWOMQ9QnVA5txXwtWVPgZ4oT0Ul7fLyv/view?usp=sharing
Function to get labels from label_pbtxt file
def get_label_from_label_map_pbtxt(label_map_pbtxt_path):
CLASS_NAMES = []
with open(label_map_pbtxt_path, "r") as file:
for line in file:
if 'name:' in line:
img_class_name = line.split('name:')[-1].replace("'", "").strip()
CLASS_NAMES.append(img_class_name)
return CLASS_NAMES
Function to read tfrecord file
def read_tfrecord(example):
read_features = {'image/encoded' : tf.io.FixedLenFeature([], dtype = tf.string),
'image/object/class/label': tf.io.FixedLenFeature([], dtype = tf.int64)}
parsed_s_example = tf.io.parse_single_example(serialized=example,
features=read_features)
encoded_image = parsed_s_example['image/encoded']
decoded_image = tf.io.decode_image(encoded_image)
image_label = parsed_s_example['image/object/class/label']
return decoded_image, image_label
def read_tfrecord_parallelize(tfrecord_data_name):
"""
Perform parallel reading of data from TFRecordDataset generated by using example_proto
For more info see: to_gpt_tfrecord_example and get_data_and_write_gpt_tf_record_format
It uses read_gpt_tfrecord function to read data.
Args:
tfrecord_data_name(str): name of tfrecord dataset to read
Returns:
dataset(tensorflow.python.data.ops.dataset_ops.ShuffleDataset): a shuffled dataset read from tfrecord_data_name
"""
dataset = tf.data.TFRecordDataset(tfrecord_data_name, num_parallel_reads=AUTO)
dataset = dataset.with_options(option_no_order)
dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTO)
dataset = dataset.shuffle(300)
return dataset
By using above function dataset can be called out as
train_data = read_tfrecord_parallelize(train_tfrecord_file_path)
test_data = read_tfrecord_parallelzie(test_tfrecord_file_path
Please let me know in case further information is required.
@ravinderkhatri Linking this workaround for now. Can you please provide a colab gist for us to look into this issue. Thanks!
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
@gowthamkpr Thank you for suggesting the workaround. Actually, I have created this SO and used that workaround. Below is the link to the colab notebook if you can to replicate the issue https://colab.research.google.com/drive/1wQbUFfhtDaB5Xta574UkAXJtthui7Bt9?usp=sharing
Hello, looking at the colab, some stack trace is missing. To debug this further, can you try adding this call
tf.debugging.disable_traceback_filtering()
at the beginning of the program or unit test so we can find out more?
@rchao Done. I have added tf.debugging.disable_traceback_filtering() at the beginning of the program
I encountered same issue, does the issue already resolved? I also can provide my code and dataset if helpful for debug.
Some environment:
- OS: Ubuntu 20.04
- package
tensorboard 2.10.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.10.0 tensorflow-estimator 2.10.0 tensorflow-gpu 2.10.0 tensorflow-io-gcs-filesystem 0.27.0
Thanks both @ravinderkhatri and @chilin0525 - can you provide the full stack trace?
Sure, but I have no time to reproduce the error recently. If I want to provide full stack trace just put following code at the beginning right?
tf.debugging.disable_traceback_filtering()
Correct.
Hello, Thank you for reporting an issue.
We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras.
Consequently, This issue may not be relevant to the Keras 3 code base. After the migration is successfully completed, feel free to reopen this issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base.
If instead this issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.
To know more about Keras 3, please take a look at https://keras.io/keras_core/announcement/. Thank you!