Memory issues in train.py - Exiting training at self._traceback = tf_stack.extract_stack()
System information
-
What is the top-level directory of the model you are using: tensorflow/models
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 64bit
-
TensorFlow installed from (source or binary): binary
-
TensorFlow version (use command below): gpu-1.15
-
Have I written custom code: Yes. I added these lines in order to get the training started at all:
from tensorflow import ConfigProto
from tensorflow import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
- Bazel version: N/A
- CUDA/cuDNN version: CUDA 10.0 / cuDNN v7.6.5
- GPU model and memory: Nvidia GeForce GTX 1650 with 4GB dedicated memory
- Python version: Python 3.6.8 64bit AMD64
- Environment: Virtualenv with pip
Exact command to reproduce:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v2_coco.config
Describe the problem
After the training would get started with adding above lines to /models-master/research/object_detection/legacy/train.py, training got started on my GPU which has 4GB memory (therefore, reduced batch_size to 1). After approximately 1400 iterations however, training stops with wierd errors, which can be traced back to the following:
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Initially, this is the root error which causes the crash:
(0) Invalid argument: Nan in summary histogram for:
ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
Now I tried different solutions for this, but none of them worked. Full output and .config file seen below. Is the problem caused by my graphics card lacking memory or what else could be the problem?
Logs and errors:
Console output:
INFO:tensorflow:global step 1474: loss = 7.0426 (0.075 sec/step)
I0518 12:43:10.650701 12884 learning.py:507] global step 1474: loss = 7.0426 (0.075 sec/step)
INFO:tensorflow:global step 1475: loss = 7.8581 (0.076 sec/step)
I0518 12:43:10.726234 12884 learning.py:507] global step 1475: loss = 7.8581 (0.076 sec/step)
INFO:tensorflow:global step 1476: loss = 10.0487 (0.105 sec/step)
I0518 12:43:10.831919 12884 learning.py:507] global step 1476: loss = 10.0487 (0.105 sec/step)
INFO:tensorflow:Error reported to Coordinator: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance':
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 355, in train
model_var.op.name, model_var))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\ops\gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Traceback (most recent call last):
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 495, in run
self.run_loop()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\supervisor.py", line 1045, in run_loop
[self._sv.summary_op, self._sv.global_step])
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance':
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 355, in train
model_var.op.name, model_var))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\ops\gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
I0518 12:43:10.883035 11372 coordinator.py:219] Error reported to Coordinator: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance':
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 355, in train
model_var.op.name, model_var))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\ops\gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Traceback (most recent call last):
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 495, in run
self.run_loop()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\supervisor.py", line 1045, in run_loop
[self._sv.summary_op, self._sv.global_step])
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance':
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 355, in train
model_var.op.name, model_var))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\ops\gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
INFO:tensorflow:global step 1477: loss = 9.4083 (0.077 sec/step)
I0518 12:43:11.339294 12884 learning.py:507] global step 1477: loss = 9.4083 (0.077 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
I0518 12:43:11.352624 12884 learning.py:785] Finished training! Saving model to disk.
C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\writer\writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
warnings.warn("Attempting to use a closed FileWriter. "
Traceback (most recent call last):
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[{{node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 417, in train
saver=saver)
File "C:\tensorflow\lib\site-packages\tensorflow_core\contrib\slim\python\slim\learning.py", line 790, in train
ignore_live_threads=ignore_live_threads)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\supervisor.py", line 839, in stop
ignore_live_threads=ignore_live_threads)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "c:\users\nemes\appdata\local\programs\python\python36\lib\site-packages\six.py", line 703, in reraise
raise value
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\coordinator.py", line 495, in run
self.run_loop()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\training\supervisor.py", line 1045, in run_loop
[self._sv.summary_op, self._sv.global_step])
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[FeatureExtractor/MobilenetV2/expanded_conv_1/project/weights/read/_63]]
(1) Invalid argument: Nan in summary histogram for: ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance
[[node ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance (defined at C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'ModelVars/FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/BatchNorm/moving_variance':
File "object_detection/legacy/train.py", line 191, in <module>
tf.app.run()
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "object_detection/legacy/train.py", line 187, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\nemes\Documents\models-master\research\object_detection\legacy\trainer.py", line 355, in train
model_var.op.name, model_var))
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\ops\gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
.config-file:
# SSD with Mobilenet v2 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 20
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 1
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "C:/Users/nemes/Documents/ssd_mobilenet_v2_coco/model.ckpt"
fine_tune_checkpoint_type: "detection"
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "C:/Users/nemes/Documents/data/train.record"
}
label_map_path: "C:/Users/nemes/Documents/data/label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "C:/Users/nemes/Documents/data/eval.record"
}
label_map_path: "C:/Users/nemes/Documents/data/label_map.pbtxt"
shuffle: false
num_readers: 1
}