TensorFlowASR
TensorFlowASR copied to clipboard
Testing fails for contextnet model
@usimarit Hello, I trained the context net model for 1 epoch and saved the model. Then I ran the test.py script and following error is thrown --
Use RNNT loss in TensorFlow
Use characters ...
Traceback (most recent call last):
File "test.py", line 95, in <module>
contextnet.load_weights(args.saved)
File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/TensorFlowASR-1.0.0-py3.8.egg/tensorflow_asr/models/base_model.py", line 65, in load_weights
File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2211, in load_weights
hdf5_format.load_weights_from_hdf5_group(f, self.layers)
File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 683, in load_weights_from_hdf5_group
raise ValueError('You are trying to load a weight file '
ValueError: You are trying to load a weight file containing 4 layers into a model with 3 layers.
On further debugging, I opened the model-weights using the following code-
model = 'model_trained_01_v100_error_34layer.h5'
f = h5py.File(model, 'r')
for item in f.attrs.keys():
print (item, f.attrs[item])
The output is -
backend b'tensorflow'
keras_version b'2.4.0'
layer_names [b'contextnet_encoder' b'contextnet_prediction' b'contextnet_joint'
b'loss']
This extra loss key is added, which is causing problem. Please can you help how to remove this?
@vaibhav016 you can load the weights by name.
@usimarit Okay, Thank you so much. I made this edit. Moeover, I ran the testing script without any changes on google colab, there it ran absolutely fine without any error. Can you point out, any possible reason for this extra key(loss). I also ran on MacBook pro. There also it ran fine. But on my gpu (tesla v100) that error was coming. I further noticed the logs and found out this
WARNING:tensorflow:From /home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
After this message, it takes a lot of time, and training starts. But this message doesn't come on google colab.
I had the same issue on tesla P100 (ubuntu18.04). Thank you @usimarit. It works for me.
contextnet.load_weights(args.saved, by_name=True)
@vaibhav016 I don't have the gpu to test. But anyway I don't quite get your idea. Are you able to load the weights by name? And if the loading is ok, are you able to run the test loop?
After this message, it takes a lot of time, and training starts. But this message doesn't come on google colab.
If I get this right, you mean "and testing starts". Do you still see it in TF2.6?
I’ll close the issue here due to inactivity. Feel free to reopen if you have any more questions.