Reduction axis 0 is empty in shape [0,2] error while training object detection using frcnn
I am training object detection with 3159 images and got this error while training.
INFO:tensorflow:Training 279 vars from pretrained module; from "truncated_base_network/resnet_v1_101/block2/unit_1/bottleneck_v1/shortcut/weights:0" to "truncated_base_network/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0".
/iq_storage/virtualenv/table_detection/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Starting training for <luminoth.models.fasterrcnn.fasterrcnn.FasterRCNN object at 0x7fd080d9ca58>
WARNING:tensorflow:From /iq_storage/virtualenv/table_detection/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py:118: initialize_local_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.local_variables_initializer instead.
INFO:tensorflow:ImageVisHook was created with mode = "train"
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-11-23 14:51:27.574463: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
INFO:tensorflow:Restoring parameters from /iq_storage/table_detection/uq_data/all_uq_tables/6/jobs/table-area-detection-0.1/model.ckpt-718
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 718 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:Saving checkpoints for 719 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 719, file: b'LSE_HTO_2013_Original_203.png', train_loss: 4.515841484069824, in 21.23s
INFO:tensorflow:Saving checkpoints for 720 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 720, file: b'LSE_HTO_2014_Original_250.png', train_loss: 4.2614898681640625, in 19.90s
INFO:tensorflow:Saving checkpoints for 721 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 721, file: b'LSE_HSP_2014_Original_57.png', train_loss: 5.102310657501221, in 19.99s
INFO:tensorflow:Saving checkpoints for 722 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 722, file: b'LSE_HSV_2006_Original_90.png', train_loss: 4.3329644203186035, in 15.97s
INFO:tensorflow:Saving checkpoints for 723 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 723, file: b'LSE_HTO_2014_Original_252.png', train_loss: 4.909831523895264, in 16.07s
INFO:tensorflow:Saving checkpoints for 724 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 724, file: b'LSE_BSY_2012_Original_120.png', train_loss: 6.184804439544678, in 16.49s
INFO:tensorflow:Saving checkpoints for 725 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 725, file: b'LSE_HL_2015_Original_16.png', train_loss: 7.809842109680176, in 16.19s
INFO:tensorflow:Saving checkpoints for 726 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 726, file: b'LSE_ADN_2008_Original_74.png', train_loss: 4.301699638366699, in 16.22s
INFO:tensorflow:Saving checkpoints for 727 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 727, file: b'Appendix_HSI_32200_Call_Termsheet_Original_1.png', train_loss: 5.701766014099121, in 16.12s
INFO:tensorflow:Saving checkpoints for 728 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 728, file: b'LSE_ABF_2010_Original_12.png', train_loss: 7.281338691711426, in 16.34s
INFO:tensorflow:Saving checkpoints for 729 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 729, file: b'LSE_MCB_2012_Original_71.png', train_loss: 4.783441543579102, in 16.38s
INFO:tensorflow:Saving checkpoints for 730 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 730, file: b'LSE_ADN_2008_Original_47.png', train_loss: 5.152769565582275, in 16.12s
INFO:tensorflow:Saving checkpoints for 731 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 731, file: b'LSE_LGEN_2016_Original_116.png', train_loss: 4.426446914672852, in 16.19s
INFO:tensorflow:Saving checkpoints for 732 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 732, file: b'Appendix_.HSI_25600_Put_Termsheet_Original_4.png', train_loss: 5.287724494934082, in 16.12s
INFO:tensorflow:Saving checkpoints for 733 into /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/table-area-detection-0.1/model.ckpt.
INFO:tensorflow:step: 733, file: b'LSE_ADN_2010_Original_101.png', train_loss: nan, in 17.80s
Traceback (most recent call last):
File "/iq_storage/virtualenv/table_detection/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/iq_storage/virtualenv/table_detection/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/iq_storage/virtualenv/table_detection/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reduction axis 0 is empty in shape [0,2]
[[Node: fasterrcnn/rcnn/rcnn_proposal/ArgMax_1 = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fasterrcnn/rcnn/rcnn_proposal/bbox_overlap/Maximum_4, gradients/gradients/losses/RPNLoss/boolean_mask_1/GatherV2_grad/concat/axis)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/iq_storage/virtualenv/table_detection/bin/lumi", line 11, in
[[Node: fasterrcnn/rcnn/rcnn_proposal/ArgMax_1 = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fasterrcnn/rcnn/rcnn_proposal/bbox_overlap/Maximum_4, gradients/gradients/losses/RPNLoss/boolean_mask_1/GatherV2_grad/concat/axis)]]
Caused by op 'fasterrcnn/rcnn/rcnn_proposal/ArgMax_1', defined at:
File "/iq_storage/virtualenv/table_detection/bin/lumi", line 11, in
InvalidArgumentError (see above for traceback): Reduction axis 0 is empty in shape [0,2]
[[Node: fasterrcnn/rcnn/rcnn_proposal/ArgMax_1 = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fasterrcnn/rcnn/rcnn_proposal/bbox_overlap/Maximum_4, gradients/gradients/losses/RPNLoss/boolean_mask_1/GatherV2_grad/concat/axis)]]
I think you can give your configuration file(.yml) and the commands you use so that contributors might be more likely to give you some advice.
Refer to the tips in the documentation, I use lumi to train on VOC2007 dataset , it performed well during training.
https://luminoth.readthedocs.io/en/latest/usage/training.html
Hi @LarT2P , Thank you for your advice.
- First of all I am using the image pre processing techniques described in this paper.
- Then I create tfrecords using the command: lumi dataset transform --type csv --data-dir data/ --output-dir tfdata/ --split train --split val --only-classes=table
- And finally I train using following command: lumi train -c config.yml
I used the following configuration file (config.yml).
train: run_name: table-area-detection-0.1 learning_rate: decay_method: exponential_decay decay_rate: 0.5 decay_steps: 5000 learning_rate: 0.0003 job_dir: /iq_storage/table_detection/uq_data/folder6_7_9_11_one_epoch_model/jobs/ save_checkpoint_secs: 10 save_summaries_secs: 10
num_epochs: 1
dataset: type: object_detection dir: /iq_storage/table_detection/uq_data/all_uq_tables/9/tfdata/ image_preprocessing: min_size: 600 max_size: 1024 data_augmentation: - flip: left_right: True up_down: True prob: 0.5
model: type: fasterrcnn network: num_classes: 1
Hi, I am also in the same problem. did you find any solution @annusrcm?
I'm also having the same problem. The MS COCO format was not working so I converted to CSV and now I'm getting the same error.
I'm suffering a similar issue. For me it is not crashing when using this first config, but when wanting more customization there is some parameter which causes the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reduction axis 0 is empty in shape [0,1] [[{{node fasterrcnn/rcnn/rcnn_proposal/ArgMax_1}} = ArgMax[T=DT_FLOAT, Tidx=DT_INT32, output_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fasterrcnn/rcnn/rcnn_proposal/bbox_overlap/Maximum_4, gradients/gradients/losses/RCNNLoss/cls_score_labeled/GatherV2_grad/concat/axis)]] [[{{node fasterrcnn/rcnn/rcnn_proposal/GreaterEqual_2/_7295}} = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7064_fasterrcnn/rcnn/rcnn_proposal/GreaterEqual_2", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
I have tried changing the foreground and background thresholds for both the rpn and rcnn, the anchor scales and ratios as well as some of the nms settings, but nothing seems to help. First, more limited config: (which doesn't crash)
train: run_name: DTD_full_data_5 job_dir: jobs/
save_checkpoint_secs: 60 save_summaries_secs: 60
num_epochs: 40000
batch_size: 1
random_shuffle: True
learning_rate: _replace: True learning_rate: 0.0003
optimizer: _replace: True type: momentum momentum: 0.9
eval: image_vis: eval
dataset: type: object_detection
dir: tfdata
image_preprocessing: min_size: 600 data_augmentation: - flip: left_right: False up_down: True prob: 0.2
model: type: fasterrcnn network: num_classes: 1 with_rcnn: True
batch_norm: False
base_network: architecture: resnet_v2_152 trainable: True
weights:
download: True
fine_tune_from:
loss: rpn_cls_loss_weight: 1.0 rpn_reg_loss_weights: 1.0 rcnn_cls_loss_weight: 1.0 rcnn_reg_loss_weights: 1.0
anchors: base_size: 256 scales: [0.5,1,2] ratios: [0.5,1,2,3] stride:
Second, more custom config (which causes the crash after training for a bit):
train: run_name: DTD_full_data_4
job_dir: jobs/
save_checkpoint_secs: 60 save_summaries_secs: 60
num_epochs: 40000
batch_size: 1
random_shuffle: True
learning_rate: _replace: True decay_method: piecewise_constant boundaries: [25000, 45000, 60000] values: [0.0003, 0.0001, 0.00003, 0.00001]
optimizer: _replace: True type: momentum momentum: 0.9
eval: image_vis: eval
dataset: type: object_detection
dir: tfdata
image_preprocessing: min_size: 600 data_augmentation: - flip: left_right: False up_down: True prob: 0.2
model: type: fasterrcnn network: # Total number of classes to predict. num_classes: 1 with_rcnn: True
batch_norm: False
base_network: architecture: resnet_v2_152 trainable: True
weights:
download: True
fine_tune_from:
output_stride: 16
arg_scope:
weight_decay: 0.0005
loss: rpn_cls_loss_weight: 1.0 rpn_reg_loss_weights: 1.0 rcnn_cls_loss_weight: 1.0 rcnn_reg_loss_weights: 1.0
anchors: base_size: 256 scales: [0.5,1,2] ratios: [0.5,1,2,3]
rpn: activation_function: relu6 l2_regularization_scale: 0.0005 # Disable using 0. l1_sigma: 3.0 num_channels: 512 kernel_shape: [3, 3] rpn_initializer: _replace: True type: random_normal_initializer mean: 0.0 stddev: 0.01 cls_initializer: _replace: True type: random_normal_initializer mean: 0.0 stddev: 0.01 bbox_initializer: _replace: True type: random_normal_initializer mean: 0.0 stddev: 0.001
proposals:
pre_nms_top_n: 12000
post_nms_top_n: 2000
apply_nms: True
nms_threshold: 0.2
min_size: 0 # Disable using 0.
clip_after_nms: False
filter_outside_anchors: True
min_prob_threshold: 0.0
target:
allowed_border: 0
clobber_positives: False
foreground_threshold: 0.7
background_threshold_high: 0.3
background_threshold_low: 0.0
foreground_fraction: 0.5
minibatch_size: 256
rcnn:
# layer_sizes: []
dropout_keep_prob: 1.0
activation_function: relu6
l2_regularization_scale: 0.0005
l1_sigma: 1.0
use_mean: True
target_normalization_variances: [0.1, 0.2]
rcnn_initializer:
_replace: True
type: variance_scaling_initializer
factor: 1.0
uniform: True
mode: FAN_AVG
cls_initializer:
_replace: True
type: random_normal_initializer
mean: 0.0
stddev: 0.01
bbox_initializer:
_replace: True
type: random_normal_initializer
mean: 0.0
stddev: 0.001
roi:
pooling_mode: crop
pooled_width: 7
pooled_height: 7
padding: VALID
proposals:
class_max_detections: 100
class_nms_threshold: 0.5
total_max_detections: 300
min_prob_threshold: 0.5
target:
foreground_fraction: 0.3
minibatch_size: 512
foreground_threshold: 0.7
background_threshold_high: 0.3
background_threshold_low: 0.0
(Sorry for the odd formatting, I tried making stuff pretty, but somehow it does not appreciate my effort 😅)