rnnt-speech-recognition icon indicating copy to clipboard operation
rnnt-speech-recognition copied to clipboard

Inference is giving ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors:

Open tumusudheer opened this issue 4 years ago • 5 comments

Hi,

I started training the model using the entire Common Voice dataset given in the github page. I'm using tensorflow 2.2.0 with python 3.6. The training command used python run_rnnt.py --mode train --data_dir data_trail/preprocessed --batch_size 8 --eval_size 100 using 1080Ti single GPU. I got OOM error after about 18k steps (still in Epoch 0) and my loss was about 116.7. The Accuracy graph in tensorboard is showing about 0.42.

Since a checkpoint is getting saved for every 1000 steps, I tried to run evaluation: python transcribe_file.py --checkpoint model/checkpoint_15000_109.9516.hdf5 --i data_trail/clips/common_voice_en_19945797.wav

But I'm getting the following error:

2020-06-12 11:19:38.910255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-12 11:19:38.929092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-12 11:19:38.929267: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:38.930665: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-12 11:19:38.931896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:38.932090: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-12 11:19:38.933532: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-12 11:19:38.934265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-12 11:19:38.937197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-12 11:19:38.938298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-12 11:19:38.938559: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-06-12 11:19:38.943923: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3398040000 Hz
2020-06-12 11:19:38.944538: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4f70350 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-12 11:19:38.944555: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-12 11:19:39.013200: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2bfda90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-12 11:19:39.013246: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-12 11:19:39.014703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-12 11:19:39.014784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:39.014824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-12 11:19:39.014860: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:39.014896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-12 11:19:39.014931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-12 11:19:39.014962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-12 11:19:39.014991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-12 11:19:39.017390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-12 11:19:39.017452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-06-12 11:19:39.020326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-12 11:19:39.020350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-12 11:19:39.020361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-06-12 11:19:39.022881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9907 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-06-12 11:19:41.880417: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-12 11:19:41.984154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Traceback (most recent call last):
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2293, in _convert_inputs_to_signature
    value, dtype_hint=spec.dtype)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 262, in constant
    allow_broadcast=True)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 270, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transcribe_file.py", line 59, in <module>
    main(args)
  File "transcribe_file.py", line 38, in main
    decoded = decoder_fn(log_melspec)[0]
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call
    *args, **kwds)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs
    self._flat_input_signature)
  File "/home/tumu/Self/Research/Work/tensorflow_work/tensorflow_2.2_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2299, in _convert_inputs_to_signature
    format_error_message(inputs, input_signature))
ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors:
  inputs: (
    tf.Tensor(
[[[ -9.891962   -10.041118   -10.170887   ...  -2.5574753   -3.1098373
    -2.8594036 ]
  [ -4.2638397   -3.8721824   -3.818324   ...  -1.2381899   -1.4718239
    -1.2757974 ]
  [ -3.8065548   -3.9217172   -3.9833403  ...  -2.5127609   -2.4093955
    -1.8164482 ]
  ...
  [  0.26996142   0.24929267   0.10105902 ...  -1.764302    -1.2930858
    -1.6539826 ]
  [ -1.3995155   -1.8580544   -2.5036726  ...  -1.9249303   -2.1395605
    -1.7865329 ]
  [ -2.521644    -2.1898646   -2.1456     ...  -2.134868    -2.5040653
    -2.1412349 ]]], shape=(1, 166, 240), dtype=float32),
    None)
  input_signature: (
    TensorSpec(shape=(None, None, 240), dtype=tf.float32, name=None),
    TensorSpec(shape=(), dtype=tf.int32, name=None))

Is this because here (hparams is not a tensor but a json) ?

tumusudheer avatar Jun 12 '20 18:06 tumusudheer

Have you fixed the issue anyway?

omerasif-itu avatar Jun 23 '20 12:06 omerasif-itu

Hi,

No didn't fix this issue. Not sure how to fix this

tumusudheer avatar Jun 23 '20 17:06 tumusudheer

@tumusudheer What did the word-error-rate look like during your training?

Mine does not look very promising:

image

stefan-falk avatar Jun 24 '20 13:06 stefan-falk

Hi,

No didn't fix this issue. Not sure how to fix this

Finally, the issue is fixed following the 2nd solution in the issue of tensorflow repo.

iterator = iter(train_dataset)
@tf.function(input_signature=[iterator.element_spec])
def train_step(dataset_inputs):
    def step_fn(inputs):
        # ... 
for batch, inputs in enumerate(train_dataset):
    loss, metrics_results = train_step(next(iterator))

VictorChen2012 avatar Sep 29 '20 01:09 VictorChen2012

Hi, I met the same problem and tried to fix it according to the 2nd solution in https://github.com/tensorflow/tensorflow/issues/29911#issuecomment-505688141

but it didn't work the bug is here.

Traceback (most recent call last): File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2293, in _convert_inputs_to_signature value, dtype_hint=spec.dtype) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1341, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 321, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 262, in constant allow_broadcast=True) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 270, in _constant_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 96, in convert_to_eager_tensor return ops.EagerTensor(value, ctx.device_name, dtype) ValueError: Attempt to convert a value (PerReplica:{ 0: <tf.Tensor: shape=(1, 310, 240), dtype=float32, numpy= array([[[-8.555949 , -8.693979 , -8.79496 , ..., -1.2911978,...

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run_rnnt.py", line 598, in app.run(main) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "run_rnnt.py", line 557, in main eval_metrics=[accuracy_fn, wer_fn]) File "run_rnnt.py", line 357, in run_training checkpoint_model() File "run_rnnt.py", line 322, in checkpoint_model metrics=eval_metrics) File "run_rnnt.py", line 444, in run_evaluate loss, metrics_results = eval_step(inputs) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(*args, **kwds) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call *args, **kwds) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs self._flat_input_signature) File "/acoustic_data1/renxiaoming/install_dir/miniconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2299, in _convert_inputs_to_signature format_error_message(inputs, input_signature)) ValueError: When input_signature is provided, all inputs to the Python function must be convertible to tensors: inputs: ( (PerReplica:{ 0: <tf.Tensor: shape=(1, 310, 240), dtype=float32, numpy= array([[[-8.555949 , -8.693979 , -8.79496 , ..., -1.2911978,...

li563042811 avatar Nov 26 '20 06:11 li563042811