TensorFlowASR Testing fails for contextnet model

@usimarit Hello, I trained the context net model for 1 epoch and saved the model. Then I ran the test.py script and following error is thrown --

    Use RNNT loss in TensorFlow
    Use characters ...
    Traceback (most recent call last):
      File "test.py", line 95, in <module>
        contextnet.load_weights(args.saved)
      File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/TensorFlowASR-1.0.0-py3.8.egg/tensorflow_asr/models/base_model.py", line 65, in load_weights
      File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2211, in load_weights
        hdf5_format.load_weights_from_hdf5_group(f, self.layers)
      File "/home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 683, in load_weights_from_hdf5_group
        raise ValueError('You are trying to load a weight file '
    
    ValueError: You are trying to load a weight file containing 4 layers into a model with 3 layers.

On further debugging, I opened the model-weights using the following code-

  model = 'model_trained_01_v100_error_34layer.h5'
  f = h5py.File(model, 'r')
  for item in f.attrs.keys():
      print (item, f.attrs[item])

The output is -

    backend b'tensorflow'
    keras_version b'2.4.0'
    layer_names [b'contextnet_encoder' b'contextnet_prediction' b'contextnet_joint'
     b'loss']

This extra loss key is added, which is causing problem. Please can you help how to remove this?

Apr 22 '21 15:04 vaibhav016

@vaibhav016 you can load the weights by name.

Apr 22 '21 17:04 nglehuy

@usimarit Okay, Thank you so much. I made this edit. Moeover, I ran the testing script without any changes on google colab, there it ran absolutely fine without any error. Can you point out, any possible reason for this extra key(loss). I also ran on MacBook pro. There also it ran fine. But on my gpu (tesla v100) that error was coming. I further noticed the logs and found out this

WARNING:tensorflow:From /home/abrol/miniconda3/envs/tf2/lib/python3.8/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.

After this message, it takes a lot of time, and training starts. But this message doesn't come on google colab.

Apr 22 '21 19:04 vaibhav016

I had the same issue on tesla P100 (ubuntu18.04). Thank you @usimarit. It works for me. contextnet.load_weights(args.saved, by_name=True)

Apr 23 '21 12:04 changji95

@vaibhav016 I don't have the gpu to test. But anyway I don't quite get your idea. Are you able to load the weights by name? And if the loading is ok, are you able to run the test loop?

After this message, it takes a lot of time, and training starts. But this message doesn't come on google colab.

If I get this right, you mean "and testing starts". Do you still see it in TF2.6?

Nov 07 '21 12:11 nglehuy

I’ll close the issue here due to inactivity. Feel free to reopen if you have any more questions.

Sep 02 '22 05:09 nglehuy

TensorFlowASR TensorFlowASR copied to clipboard

Testing fails for contextnet model

TensorFlowASR
TensorFlowASR copied to clipboard