plur icon indicating copy to clipboard operation
plur copied to clipboard

FailedPreconditionError error during evaluation

Open nashid opened this issue 3 years ago • 1 comments

At the time of running evaluation for the hoppity dataset, we are encountering the following error:

Traceback (most recent call last):
  File "train.py", line 340, in <module>
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "train.py", line 281, in main
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/model_design/evaluation.py", line 132, in evaluate
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/model_design/evaluation.py", line 245, in generate_predictions
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/model_design/evaluation.py", line 160, in evaluate_chunk
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/model_design/evaluation.py", line 329, in _evaluate_chunk
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/plur_data_loader.py", line 459, in __next__
  File "/scratch/st-amesbah-1/plur-experiment/src/plur/plur/model_design/data_generation.py", line 33, in __call__
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4635, in __next__
    return nest.map_structure(to_numpy, next(self._iterator))
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 766, in __next__
    return self._next_internal()
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 749, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 3017, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/arc/project/st-amesbah-1/conda-envs/plur/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 7209, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.FailedPreconditionError: {{function_node __wrapped__IteratorGetNext_output_types_10_device_/job:localhost/replica:0/task:0/device:CPU:0}} /arc/project/st-amesbah-1/plur-data/stage_2/tfrecords/test/hoppity_single_ast_diff_dataset-00761-of-01000.tfrecord; Bad file descriptor [Op:IteratorGetNext]

Could you please assist us in debugging this error @smoitra-g?

nashid avatar Oct 15 '22 00:10 nashid

@smoitra-g we executed the following command:

python3 train.py \
 --data_dir=/arc/project/st-amesbah-1/plur-data/stage_2 \
 --exp_dir=/arc/project/st-amesbah-1/plur-data/experiments \
 --evaluate=true

nashid avatar Oct 15 '22 01:10 nashid