rgz_rcnn icon indicating copy to clipboard operation
rgz_rcnn copied to clipboard

KeyError: 'im_info' and 'gt_boxes' when running 'example_train_slurm.sh'

Open RafaelMostert opened this issue 6 years ago • 2 comments

Running the 'demo.py' works fine, but running 'example_train_slurm.sh' returns the following key error:

Traceback (most recent call last): File "/local/s174/rgz_rcnn/tools/train_net.py", line 105, in max_iters=args.max_iters, start_iter=args.start_iter) File "/local/s174/rgz_rcnn/tools/../lib/fast_rcnn/train.py", line 333, in train_net sw.train_model(sess, max_iters, start_iter=start_iter) File "/local/s174/rgz_rcnn/tools/../lib/fast_rcnn/train.py", line 217, in train_model self.net.im_info: blobs['im_info'], KeyError: 'im_info'

The traceback refers to this code in "rgz_rcnn/lib/fast_rcnn/train.py":

210             # get one batch
211             blobs = data_layer.forward()
212 
213             # DEBUG
214             print(blobs.keys())
215             # Make one SGD update
216             feed_dict = {self.net.data: blobs['data'],
217                           self.net.im_info: blobs['im_info'],
218                          self.net.keep_prob: 0.5,
219                          self.net.gt_boxes: blobs['gt_boxes']}

I added the debug statement on line 214, which returns:

['bbox_inside_weights', 'labels', 'rois', 'bbox_targets', 'bbox_outside_weights', 'data']

Suggesting that not only 'im_info' but also 'gt_boxes' is a non existent key in the data_layer.

Any suggestions on what the problem might be?

RafaelMostert avatar Nov 03 '18 22:11 RafaelMostert

Looks like the data layer is not reading the data set properly. Before delving into the code further, just wondering how did you run the 'example_train_slurm.sh'? The script was made to run on a cluster, where the Python (TF) environment is pre-installed. So it won't invoke your python virtual environment as per the READ.ME. Do you mind shared your version of 'example_train_slurm.sh' somewhere on git? Thanks!

chenwuperth avatar Nov 05 '18 03:11 chenwuperth

Sure. I changed 'example_train_slurm.sh' to 'example_train.sh', which results in the keyerror (see crash_log_example_train.txt for the full output):

#!/bin/bash

export CUDA_VISIBLE_DEVICES=7
source activate py2-tensorflow

RGZ_RCNN=/local/s174/rgz_rcnn

python $RGZ_RCNN/tools/train_net.py \
                    --device 'gpu' \
                    --device_id 0 \ 
                    --imdb rgz_2017_trainD4 \
                    --iters 80000 \
                    --cfg $RGZ_RCNN/experiments/cfgs/faster_rcnn_end2end.yml \
                    --network rgz_train \
                    --weights $RGZ_RCNN/data/pretrained_model/imagenet/VGG_imagenet.npy

The example_test_cpu.sh works fine and is adapted to look like this:

# please change to your own python virtual environment path
export CUDA_VISIBLE_DEVICES=7
source activate py2-tensorflow
RGZ_RCNN=/local/s174/rgz_rcnn

python $RGZ_RCNN/tools/test_net.py \
                    --device 'cpu' \
                    --device_id 0 \ 
                    --imdb rgz_2017_testD4 \
                    --cfg $RGZ_RCNN/experiments/cfgs/faster_rcnn_end2end.yml \
                    --network rgz_test \
                    --weights $RGZ_RCNN/data/pretrained_model/rgz/D4/VGGnet_fast_rcnn-80000 \
                    --comp

Irrespective of the --device flag, 'example_test_cpu.sh' will actually always run on the gpu: success_log_example_test_cpu.txt

RafaelMostert avatar Nov 05 '18 09:11 RafaelMostert