seq2seq
seq2seq copied to clipboard
Parallel reading problem
I'm trying to run the small nmt model using the toy_reverse data. I have ran all possible unit tests (including the tensorflow parallel reader test) everything looks ok. However when I run i get the following error (tried different models/different data same error) TF v1.0 Cuda 8.0 Ubuntu 16.04
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.FailedPreconditionError'>, / [[Node: train_input_fn/parallel_read/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input_fn/parallel_read/TextLineReaderV2, train_input_fn/parallel_read/filenames)]]
Caused by op u'train_input_fn/parallel_read/ReaderReadV2', defined at: File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/zeina/tensorflow/grad/seq2seq/bin/train.py", line 277, in
tf.app.run() File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/home/zeina/tensorflow/grad/seq2seq/bin/train.py", line 272, in main schedule=FLAGS.schedule) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 111, in run return _execute_schedule(experiment, schedule) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule return task() File "seq2seq/contrib/experiment.py", line 104, in continuous_train_and_eval monitors=self._train_monitors) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 281, in new_func return func(*args, **kwargs) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 430, in fit loss = self._train_model(input_fn=input_fn, hooks=hooks) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 925, in _train_model features, labels = input_fn() File "seq2seq/training/utils.py", line 260, in input_fn data_provider = pipeline.make_data_provider() File "seq2seq/data/input_pipeline.py", line 180, in make_data_provider **kwargs) File "seq2seq/data/parallel_data_provider.py", line 125, in init seed=seed) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 234, in parallel_read reader_kwargs=reader_kwargs).read(filename_queue) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 132, in read enqueue_ops.append(self._common_queue.enqueue(reader.read(queue))) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 193, in read return gen_io_ops._reader_read_v2(self._reader_ref, queue_ref, name=name) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 411, in _reader_read_v2 queue_handle=queue_handle, name=name) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zeina/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack() FailedPreconditionError (see above for traceback): / [[Node: train_input_fn/parallel_read/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](train_input_fn/parallel_read/TextLineReaderV2, train_input_fn/parallel_read/filenames)]]
I do get the same error a few times in a single run actually with changes in the nodes at the end only. I hope someone can refer me to what may be causing this
I am getting the same error. Did you find a solution to this ?
no.. I even tried it on an aws machine with pre-installed tensorflow and it also didn't work!
I'm wondering if you made mistakes here:
--input_pipeline_train "
class: ParallelTextInputPipeline
params:
source_files:
- $TRAIN_SOURCES
target_files:
- $TRAIN_TARGETS" \
--input_pipeline_dev "
class: ParallelTextInputPipeline
params:
source_files:
- $DEV_SOURCES
target_files:
- $DEV_TARGETS"
you should be careful that not to use \t here. furthermore maybe you made mistake in the -
part before $TRAIN_SOURCES
.
I got the same error, and solved by @javiddadashkarimi 's solution. I just used the right '-' in the yaml file. Thanks!
@Zeina-T
Hi, I think you have not initialised the global variables after creating the graph.
before doing any operation on graph you can use following;
sess.run(tf.global_variables_initializer())
as long as you don't do this, the weights and biased in your network graph wound be initialised. so you get FailedPreconditionError .
I had same kind of problems sometimes. I found this to be working. :)
Hope it will help you.