rnnt-speech-recognition
rnnt-speech-recognition copied to clipboard
get Segmentation fault when training
Hi, Error log as below:
Starting training. Performing evaluation. loss Tensor("transducer/dense_1/BiasAdd:0", shape=(None, None, None, 3971), dtype=float32, device=/job:localhost/replica:0/task:0/device:GPU:0) Tensor("dist_inputs_4:0", shape=(None, None), dtype=int32) Tensor("Cast:0", shape=(None,), dtype=int32, device=/job:localhost/replica:0/task:0/device:GPU:0) Tensor("dist_inputs_3:0", shape=(None,), dtype=int32) Fatal Python error: Segmentation fault
Thread 0x00007f6989132700 (most recent call first): File "/usr/lib64/python3.6/threading.py", line 295 in wait File "/usr/lib64/python3.6/threading.py", line 551 in wait File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 978 in run File "/usr/lib64/python3.6/threading.py", line 916 in _bootstrap_inner File "/usr/lib64/python3.6/threading.py", line 884 in _bootstrap
Current thread 0x00007f6989933700 (most recent call first):
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654 in _create_c_op
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1817 in init
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3327 in _create_op_internal
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 595 in _create_op_internal
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 744 in _apply_op_helper
File "
Thread 0x00007f6d24153740 (most recent call first):
File "/usr/lib64/python3.6/threading.py", line 295 in wait
File "/usr/lib64/python3.6/threading.py", line 551 in wait
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 165 in _call_for_each_replica
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 770 in _call_for_each_replica
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2290 in call_for_each_replica
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 951 in run
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 346 in _call_unconverted
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 492 in converted_call
File "/tmp/tmp5y46mg16.py", line 66 in tf__eval_step
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 585 in converted_call
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 964 in wrapper
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 441 in wrapped_fn
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 981 in func_graph_from_py_func
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2667 in _create_graph_function
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2777 in _maybe_define_function
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2446 in _get_concrete_function_internal_garbage_collected
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 506 in _initialize
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 627 in _call
File "/home/zhangqin/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580 in call
File "run_rnnt.py", line 434 in run_evaluate
File "run_rnnt.py", line 312 in checkpoint_model
File "run_rnnt.py", line 347 in run_training
File "run_rnnt.py", line 547 in main
File "/home/zhangqin/.local/lib/python3.6/site-packages/absl/app.py", line 251 in _run_main
File "/home/zhangqin/.local/lib/python3.6/site-packages/absl/app.py", line 300 in run
File "run_rnnt.py", line 588 in
the code which caused Segmentation fault print(y_pred, y_true, spec_lengths, label_lengths) loss = rnnt_loss(y_pred, y_true, spec_lengths, label_lengths) print('l f')
Thanks
I also met the same problem. Have you found a solution?
I also met the same problem. Have you found a solution?
I found in the below code: loss = rnnt_loss(y_pred, y_true, spec_lengths, label_lengths) y_pred is 'tensorflow.python.framework.ops.Tensor' change rnn to dnn and y_pred became ''tensorflow.python.framework.ops.EagerTensor' and Segmentation fault disappear. I'm working on using rnn and get EagerTensor
@etyhh What version of TensorFlow are you using?
@etyhh What version of TensorFlow are you using?
tensorflow-gpu==2.2.0
When I set print(tf.executing_eagerly()) before loss = rnnt_loss(y_pred, y_true, spec_lengths, label_lengths), got False, that is to say, the eager mode changed in the loss function.
When I set print(tf.executing_eagerly()) before loss = rnnt_loss(y_pred, y_true, spec_lengths, label_lengths), got False, that is to say, the eager mode changed in the loss function.
I tried add tf.config.experimental_run_functions_eagerly(True) at the begin of run_rnnt.py and loss.py. before loss = rnnt_loss() , print(tf.executing_eagerly()) return True but print(type(y_pred)) return 'tensorflow.python.framework.ops.Tensor'
I encountered same error as you,and i assumed the err error is generated from rnnt_loss, i have try some ways ,but it didn't work,anyone has fixed it?
change tf.compat.v1.nn.rnn_cell.LSTMCell to tf.keras.layers.LSTMCell works for me But tf.keras.layers.LSTMCell doesn't support projection