tensor2tensor icon indicating copy to clipboard operation
tensor2tensor copied to clipboard

RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint

Open 919294AkshatSharma opened this issue 2 years ago • 0 comments

Description

Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5

Environment information

OS: Ubuntu:18.04.5

$ pip freeze | grep tensor

mesh-tensorflow==0.1.21 tensor2tensor==1.15.7 tensorboard==1.15.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==1.15.0 tensorflow-addons==0.19.0 tensorflow-datasets==3.2.1 tensorflow-estimator==1.15.1 tensorflow-gan==2.1.0 tensorflow-hub==0.13.0 tensorflow-io-gcs-filesystem==0.32.0 tensorflow-metadata==1.12.0 tensorflow-probability==0.7.0 tensorstore==0.1.28

$ python -V Python 3.7.12

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
Traceback (most recent call last):
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module>
    tf.app.run(main)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper
    return function(*args, **kwargs)
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main
    t2t_trainer.main(argv)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main
    execute_schedule(exp)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule
    getattr(exp, FLAGS.schedule)()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval
    self._eval_spec)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec
    any_step_done = True
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__
    self._close_internal(exception_type)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal
    h.end(self._coordinated_creator.tf_sess)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end
    self._save(session, last_step)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save
    if l.after_save(session, step):
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
    self._evaluate(global_step_value)  # updates self.eval_result
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate
    'Eval status: {}'.format(self.eval_result.status))
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint

919294AkshatSharma avatar Jun 27 '23 04:06 919294AkshatSharma