tacotron
tacotron copied to clipboard
sanity_check = False crashes training
When I turn off sanity_check, I obtain the following and the training crashes:
ubuntu@ip-172-31-13-191:~/tacotron$ python3 train.py
Training Graph loaded
2017-11-04 08:15:28.431148: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled t
o use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-04 08:15:28.601591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but ther
e must be at least one NUMA node, so returning NUMA node zero
2017-11-04 08:15:28.601950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1031] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8755
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2017-11-04 08:15:28.601991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80
, pci bus id: 0000:00:1e.0, compute capability: 3.7)
0%| | 0/384 [00:00<?, ?b/s]2017-11-04 08:15:40.637914: W tensorflow/core/framework/op_kernel.cc:1192] Out of range:
PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 1)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
2017-11-04 08:15:40.639653: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insu
fficient elements (requested 32, current size 1)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
[REPEATS FOR A WHILE]
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32,
current size 1)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 962, in managed_session
yield sess
File "train.py", line 121, in main
sess.run(g.train_op)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32,
current size 1)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
Caused by op 'batch', defined at:
File "train.py", line 128, in <module>
main()
File "train.py", line 108, in main
g = Graph(); print("Training Graph loaded")
File "train.py", line 35, in __init__
self.x, self.y, self.z, self.num_batch = get_batch()
File "/home/ubuntu/tacotron/data_load.py", line 166, in get_batch
dynamic_pad=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 911, in batch
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 706, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2991, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1479, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size
1)
[[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 128, in <module>
main()
File "train.py", line 125, in main
sv.saver.save(sess, hp.logdir + '/model_epoch_%02d_gs_%d' % (epoch, gs))
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 972, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 800, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/home/ubuntu/tacotron/data_load.py", line 93, in _run
self.func(sess, enqueue_op) # call enqueue function
File "/home/ubuntu/tacotron/data_load.py", line 40, in enqueue_func
data = func(sess.run(inputs))
File "/home/ubuntu/tacotron/data_load.py", line 147, in get_text_and_spectrograms
_spectrogram, _magnitude = get_spectrograms(_sound_file)
File "/home/ubuntu/tacotron/utils.py", line 28, in get_spectrograms
y, sr = librosa.load(sound_file, sr=hp.sr) # or set sr to hp.sr.
File "/usr/local/lib/python3.5/dist-packages/librosa/core/audio.py", line 107, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "/usr/local/lib/python3.5/dist-packages/audioread/__init__.py", line 80, in audio_open
return rawread.RawAudioFile(path)
File "/usr/local/lib/python3.5/dist-packages/audioread/rawread.py", line 64, in __init__
self._file = aifc.open(self._fh)
File "/usr/lib/python3.5/aifc.py", line 890, in open
return Aifc_read(f)
File "/usr/lib/python3.5/aifc.py", line 340, in __init__
self.initfp(f)
File "/usr/lib/python3.5/aifc.py", line 303, in initfp
chunk = Chunk(file)
File "/usr/lib/python3.5/chunk.py", line 63, in __init__
raise EOFError
EOFError
Check the audio files.. looks like one of them cannot be opened by librosa.. I had a similar problem when I was using mp3 files, and the queue handler couldn't handle calling an external command (ffmpeg). It looks like you are not using wav files?