tf-faster-rcnn
tf-faster-rcnn copied to clipboard
InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
I followed your instruction and got this error. Can you please suggest solutions?
mona@pascal:~/computer_vision/tf-faster-rcnn$ GPU_ID=0
mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ STEPSIZE=50000
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43: No such file or directory
+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ set +x
+ '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/trainval_vgg16_net.py --weight data/imagenet_weights/vgg16.weights --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --set TRAIN.STEPSIZE 50000
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, set_cfgs=['TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'vgg16',
'GPU_ID': 0,
'MATLAB': 'matlab',
'PIXEL_MEANS': array([[[ 102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'crop',
'RNG_SEED': 3,
'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
'TEST': {'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'selective_search',
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': {'ASPECT_GROUPING': False,
'BATCH_SIZE': 256,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'DISPLAY': 20,
'DOUBLE_BIAS': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.001,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
'STEPSIZE': 50000,
'SUMMARY_INTERVAL': 180,
'TRUNCATED': False,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0005},
'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to `/home/mona/computer_vision/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default`
TensorFlow summaries will be saved to `/home/mona/computer_vision/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default`
Loaded dataset `voc_2007_test` for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.85GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Solving...
Loading caffe weights...
Done!
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading initial model weights from data/imagenet_weights/vgg16.weights
Loaded.
iter: 20 / 70000, total loss: 0.443026
>>> rpn_loss_cls: 0.345992
>>> rpn_loss_box: 0.097034
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.749s / iter
iter: 40 / 70000, total loss: 0.516920
>>> rpn_loss_cls: 0.399234
>>> rpn_loss_box: 0.117686
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.760s / iter
iter: 60 / 70000, total loss: 0.393830
>>> rpn_loss_cls: 0.353334
>>> rpn_loss_box: 0.040496
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.668s / iter
iter: 80 / 70000, total loss: 0.217178
>>> rpn_loss_cls: 0.146591
>>> rpn_loss_box: 0.070533
>>> loss_cls: 0.000053
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.536s / iter
iter: 100 / 70000, total loss: 0.390607
>>> rpn_loss_cls: 0.277706
>>> rpn_loss_box: 0.030601
>>> loss_cls: 0.075361
>>> loss_box: 0.006940
>>> lr: 0.001000
speed: 1.495s / iter
iter: 120 / 70000, total loss: 0.882707
>>> rpn_loss_cls: 0.566185
>>> rpn_loss_box: 0.227990
>>> loss_cls: 0.083081
>>> loss_box: 0.005452
>>> lr: 0.001000
speed: 1.570s / iter
iter: 140 / 70000, total loss: 0.223789
>>> rpn_loss_cls: 0.113045
>>> rpn_loss_box: 0.049687
>>> loss_cls: 0.052417
>>> loss_box: 0.008640
>>> lr: 0.001000
speed: 1.510s / iter
iter: 160 / 70000, total loss: 0.219555
>>> rpn_loss_cls: 0.187197
>>> rpn_loss_box: 0.032358
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.494s / iter
iter: 180 / 70000, total loss: 2.256282
>>> rpn_loss_cls: 1.965876
>>> rpn_loss_box: 0.290406
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.475s / iter
iter: 200 / 70000, total loss: 1.727870
>>> rpn_loss_cls: 1.226427
>>> rpn_loss_box: 0.501443
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.463s / iter
iter: 220 / 70000, total loss: 0.353863
>>> rpn_loss_cls: 0.298823
>>> rpn_loss_box: 0.055040
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.461s / iter
iter: 240 / 70000, total loss: 0.147688
>>> rpn_loss_cls: 0.039554
>>> rpn_loss_box: 0.108122
>>> loss_cls: 0.000012
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.450s / iter
iter: 260 / 70000, total loss: 0.485889
>>> rpn_loss_cls: 0.416970
>>> rpn_loss_box: 0.068911
>>> loss_cls: 0.000009
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.428s / iter
iter: 280 / 70000, total loss: 0.153297
>>> rpn_loss_cls: 0.108915
>>> rpn_loss_box: 0.044243
>>> loss_cls: 0.000139
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.440s / iter
iter: 300 / 70000, total loss: 0.374053
>>> rpn_loss_cls: 0.310106
>>> rpn_loss_box: 0.063945
>>> loss_cls: 0.000001
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.397s / iter
iter: 320 / 70000, total loss: 1.169239
>>> rpn_loss_cls: 1.099040
>>> rpn_loss_box: 0.070199
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.385s / iter
iter: 340 / 70000, total loss: 0.243177
>>> rpn_loss_cls: 0.193078
>>> rpn_loss_box: 0.049057
>>> loss_cls: 0.001042
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.370s / iter
iter: 360 / 70000, total loss: 0.387752
>>> rpn_loss_cls: 0.375503
>>> rpn_loss_box: 0.012084
>>> loss_cls: 0.000166
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.353s / iter
iter: 380 / 70000, total loss: 0.494936
>>> rpn_loss_cls: 0.312221
>>> rpn_loss_box: 0.045870
>>> loss_cls: 0.136845
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.336s / iter
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:55: RuntimeWarning: invalid value encountered in subtract
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
iter: 400 / 70000, total loss: nan
>>> rpn_loss_cls: nan
>>> rpn_loss_box: nan
>>> loss_cls: 3.037189
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.321s / iter
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Traceback (most recent call last):
File "./tools/trainval_vgg16_net.py", line 117, in <module>
max_iters=args.max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
sw.train_model(sess, max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 197, in train_model
self.net.train_step_with_summary(sess, blobs, train_op)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 561, in train_step_with_summary
feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Caused by op u'TRAIN/vgg16_default/conv3_1/weight', defined at:
File "./tools/trainval_vgg16_net.py", line 117, in <module>
max_iters=args.max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
sw.train_model(sess, max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 91, in train_model
tag='default', anchor_scales=anchors)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 507, in create_architecture
self._add_train_summary(var)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 48, in _add_train_summary
tf.summary.histogram('TRAIN/' + var.op.name, var)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:652] Deallocating stream with pending work
Command exited with non-zero status 1
435.97user 110.56system 9:22.01elapsed 97%CPU (0avgtext+0avgdata 2976644maxresident)k
60224inputs+2752outputs (4major+2126190minor)pagefaults 0swaps
mona@pascal:~/computer_vision/tf-faster-rcnn$
The zero losses look suspicious, and the loss should not be this low in the first iterations. Could you check if the compilation on k40 is the issue? BTW the code has updated a bit now, maybe you want to refork it.
This is what happened after I did git pull and ran the training:
mona@pascal:~/computer_vision/tf-faster-rcnn$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/endernewton/tf-faster-rcnn
9731cc0..83bc041 master -> origin/master
Updating 9731cc0..83bc041
Fast-forward
README.md | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
mona@pascal:~/computer_vision/tf-faster-rcnn$ GPU_ID=0
mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/test_vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-15_15-23-01
+ exec
++ tee -a experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-15_15-23-01
tee: experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-15_15-23-01: No such file or directory
+ echo Logging output to experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-15_15-23-01
Logging output to experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-15_15-23-01
+ set +x
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/test_vgg16_net.py --imdb voc_2007_test --weight data/imagenet_weights/vgg16.weights --model output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt --cfg experiments/cfgs/vgg16.yml --set
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', comp_mode=False, imdb_name='voc_2007_test', max_per_image=100, model='output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt', set_cfgs=[], tag='', weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'vgg16',
'GPU_ID': 0,
'MATLAB': 'matlab',
'PIXEL_MEANS': array([[[ 102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'crop',
'RNG_SEED': 3,
'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
'TEST': {'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'selective_search',
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': {'ASPECT_GROUPING': False,
'BATCH_SIZE': 256,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'DISPLAY': 20,
'DOUBLE_BIAS': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.001,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
'STEPSIZE': 30000,
'SUMMARY_INTERVAL': 180,
'TRUNCATED': False,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0005},
'USE_GPU_NMS': True}
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.85GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Loading caffe weights...
Done!
Loading model check point from output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
Traceback (most recent call last):
File "./tools/test_vgg16_net.py", line 94, in <module>
saver.restore(sess, args.model)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1388, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
[[Node: save/RestoreV2_30 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_30/tensor_names, save/RestoreV2_30/shape_and_slices)]]
[[Node: save/RestoreV2_7/_135 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_51_save/RestoreV2_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'save/RestoreV2_30', defined at:
File "./tools/test_vgg16_net.py", line 93, in <module>
saver = tf.train.Saver()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1000, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1030, in build
restore_sequentially=self._restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 624, in build
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 361, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 200, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 441, in restore_v2
dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
[[Node: save/RestoreV2_30 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_30/tensor_names, save/RestoreV2_30/shape_and_slices)]]
[[Node: save/RestoreV2_7/_135 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_51_save/RestoreV2_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Command exited with non-zero status 1
6.03user 4.28system 0:07.47elapsed 138%CPU (0avgtext+0avgdata 2083556maxresident)k
0inputs+32outputs (0major+219829minor)pagefaults 0swaps
You need to download the trained model and create symolic links? Seems like it cannot find the model.
@monajalal I have the same problem. Nans appearing in my training. Have you fixed it?
@monajalal @amirhfarzaneh now I have the same problem,but it happened during iteration....Have you fixed it?I think I should check my training data and maybe there was null column in training data.....
@yidan216home I still have the problem on my GTX 980Ti gpu; but I have tested on a Quadro M4000 and a GTX 1080 and there is not a problem! What is your GPU?
@dandelionmane this seems to be a long-standing problem, occurring both for NaN's and Inf's. Can it be fixed?
@monajalal how did you figure the problem?
does anybody fixed the problem?
@monajalal , @zdm123 , @amirhfarzaneh, @yidan216home , I get the same problem with train my data , the rpn_box_loss is nan, after some research, it's because in the file 'pascal_voc.py', the function '_load_pascal_annotation' has Make pixel indexes 0-based,the code is : x1 = float(bbox.find('xmin').text) - 1 y1 = float(bbox.find('ymin').text) - 1 x2 = float(bbox.find('xmax').text) - 1 y2 = float(bbox.find('ymax').text) - 1 but if your data is not based 1, such as my data is based 0, then it will get -1 in the data, may be you can try to delete the -1 operation,hope helpful!
you may need to adjust the hyperparameters (e.g. learning rate) if you are running on another dataset
my loss is very low at the begin too,and do you know what reasons may cause this problem?
@monajalal , @zdm123 , @amirhfarzaneh, @yidan216home , I get the same problem with train my data , the rpn_box_loss is nan, after some research, it's because in the file 'pascal_voc.py', the function '_load_pascal_annotation' has Make pixel indexes 0-based,the code is : x1 = float(bbox.find('xmin').text) - 1 y1 = float(bbox.find('ymin').text) - 1 x2 = float(bbox.find('xmax').text) - 1 y2 = float(bbox.find('ymax').text) - 1 but if your data is not based 1, such as my data is based 0, then it will get -1 in the data, may be you can try to delete the -1 operation,hope helpful!
@lonlonago great,that solves my problem, thank you very much