rgz_rcnn
rgz_rcnn copied to clipboard
KeyError: 'im_info' and 'gt_boxes' when running 'example_train_slurm.sh'
Running the 'demo.py' works fine, but running 'example_train_slurm.sh' returns the following key error:
Traceback (most recent call last): File "/local/s174/rgz_rcnn/tools/train_net.py", line 105, in
max_iters=args.max_iters, start_iter=args.start_iter) File "/local/s174/rgz_rcnn/tools/../lib/fast_rcnn/train.py", line 333, in train_net sw.train_model(sess, max_iters, start_iter=start_iter) File "/local/s174/rgz_rcnn/tools/../lib/fast_rcnn/train.py", line 217, in train_model self.net.im_info: blobs['im_info'], KeyError: 'im_info'
The traceback refers to this code in "rgz_rcnn/lib/fast_rcnn/train.py":
210 # get one batch
211 blobs = data_layer.forward()
212
213 # DEBUG
214 print(blobs.keys())
215 # Make one SGD update
216 feed_dict = {self.net.data: blobs['data'],
217 self.net.im_info: blobs['im_info'],
218 self.net.keep_prob: 0.5,
219 self.net.gt_boxes: blobs['gt_boxes']}
I added the debug statement on line 214, which returns:
['bbox_inside_weights', 'labels', 'rois', 'bbox_targets', 'bbox_outside_weights', 'data']
Suggesting that not only 'im_info' but also 'gt_boxes' is a non existent key in the data_layer.
Any suggestions on what the problem might be?
Looks like the data layer is not reading the data set properly. Before delving into the code further, just wondering how did you run the 'example_train_slurm.sh'? The script was made to run on a cluster, where the Python (TF) environment is pre-installed. So it won't invoke your python virtual environment as per the READ.ME. Do you mind shared your version of 'example_train_slurm.sh' somewhere on git? Thanks!
Sure. I changed 'example_train_slurm.sh' to 'example_train.sh', which results in the keyerror (see crash_log_example_train.txt for the full output):
#!/bin/bash
export CUDA_VISIBLE_DEVICES=7
source activate py2-tensorflow
RGZ_RCNN=/local/s174/rgz_rcnn
python $RGZ_RCNN/tools/train_net.py \
--device 'gpu' \
--device_id 0 \
--imdb rgz_2017_trainD4 \
--iters 80000 \
--cfg $RGZ_RCNN/experiments/cfgs/faster_rcnn_end2end.yml \
--network rgz_train \
--weights $RGZ_RCNN/data/pretrained_model/imagenet/VGG_imagenet.npy
The example_test_cpu.sh works fine and is adapted to look like this:
# please change to your own python virtual environment path
export CUDA_VISIBLE_DEVICES=7
source activate py2-tensorflow
RGZ_RCNN=/local/s174/rgz_rcnn
python $RGZ_RCNN/tools/test_net.py \
--device 'cpu' \
--device_id 0 \
--imdb rgz_2017_testD4 \
--cfg $RGZ_RCNN/experiments/cfgs/faster_rcnn_end2end.yml \
--network rgz_test \
--weights $RGZ_RCNN/data/pretrained_model/rgz/D4/VGGnet_fast_rcnn-80000 \
--comp
Irrespective of the --device flag, 'example_test_cpu.sh' will actually always run on the gpu: success_log_example_test_cpu.txt