mobile-deeplab-v3-plus
mobile-deeplab-v3-plus copied to clipboard
NanLossDuringTrainingError: NaN loss during training.
INFO:tensorflow:Restoring parameters from /root/mobile-deeplab-v3-plus/datasets/people_segmentation/exp/deeplab-v3-plus/train/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /root/mobile-deeplab-v3-plus/datasets/people_segmentation/exp/deeplab-v3-plus/train/model.ckpt.
INFO:tensorflow:loss = 1.0444846, step = 0
INFO:tensorflow:cross_entropy = 0.6935922, learning_rate = 1e-04, total_loss = 1.0444846, train_mean_iou = 0.32585338, train_pixel_accuracy = 0.4993496
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "run.py", line 552, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 539, in main
train()
File "run.py", line 433, in train
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 356, in train
return self
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/eager/context.py", line 357, in _mode
yield
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5229, in get_controller
yield g
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5037, in get_controller
yield default
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5229, in get_controller
yield g
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/eager/context.py", line 357, in _mode
yield
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5229, in get_controller
yield g
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 4247, in device
yield
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
run_metadata=run_metadata))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 753, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
I am suffering to run this code using person segmentation dataset. Can't solve the 'NaN loss during training' error and I didn't edit any parameter. How to fix it?