Pelee icon indicating copy to clipboard operation
Pelee copied to clipboard

Bounding box sizes too large

Open simon-rob opened this issue 6 years ago • 9 comments

Robert many thanks for your great work!

I am having trouble understanding why I am getting larger than expected bounding boxes for Pelee detections.

The heights and widths are not as closely cropped when compared to mobilenet-SSD implementations. I have read that you trained the model with pytorch, could the conv padding be a problem? Or is there something else I have missed?

Many Thanks,

Simon

I am using the following python script to for my test:

net_file= 'pelee.prototxt'  
caffe_model='pelee_304x304_acc7637.caffemodel' 
test_dir = "images"

if not os.path.exists(caffe_model):
    print("caffemodel does not exist")
    exit()
net = caffe.Net(net_file,caffe_model,caffe.TEST)  

CLASSES = ('background',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'horse',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor')

def preprocess(src):
    img = cv2.resize(src, (304,304))
    img_mean = np.array([103.94, 116.78, 123.68], dtype=np.float32)
    img = img.astype(np.float32, copy=True) - img_mean
    img = img * 0.017
    return img

def postprocess(img, out):   
    h = img.shape[0]
    w = img.shape[1]
    box = out['detection_out'][0,0,:,3:7] * np.array([w, h, w, h])
    cls = out['detection_out'][0,0,:,1]
    conf = out['detection_out'][0,0,:,2]
    return (box.astype(np.int32), conf, cls)

def detect(imgfile, thresh):
    origimg = cv2.imread(imgfile)
    img = preprocess(origimg)
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))

    net.blobs['data'].data[...] = img
    out = net.forward()  
    box, conf, cls = postprocess(origimg, out)
    for i in range(len(box)):
       if conf[i] > thresh :
          p1 = (box[i][0], box[i][1])
          p2 = (box[i][2], box[i][3])
          cv2.rectangle(origimg, p1, p2, (0,255,0))
          p3 = (max(p1[0], 15), max(p1[1], 15))
          title = "%s:%.2f" % (CLASSES[int(cls[i])], conf[i])
          cv2.putText(origimg, title, p3, cv2.FONT_ITALIC, 0.6, (0, 255, 0), 1)
    cv2.imshow("Pelee", origimg)
 
    k = cv2.waitKey(0) & 0xff
    if k == 27 : return False
    return True

for f in os.listdir(test_dir):
    if detect(test_dir + "/" + f, 0.2) == False:
       break

ssd_screenshot_04 05 2018

simon-rob avatar May 04 '18 11:05 simon-rob

I have just downloaded your latest prototxt and it's fixed!

ssd_screenshot_04 05 2018-fixed

simon-rob avatar May 04 '18 11:05 simon-rob

@simon-rob , Thank you for your python script, your python script use cpu or gpu to test a picture? I use the python script to test a image in 0.3 seconds, so I don't know cpu or gpu I used. Can you give me some advice?

MrWhiteHomeman avatar May 21 '18 12:05 MrWhiteHomeman

@MrWhiteHomeman,

It depends if you successfully compiled the GPU version of Caffe and you didn't disable the GPU by uncommenting CPU_ONLY := 1 in the Makefile.config

If you do have a GPU version installed, you can switch between CPU and GPU by using:

caffe.set_mode_gpu() or caffe.set_mode_cpu()

Otherwise it should default to using the GPU.

You could try putting caffe.set_mode_cpu() in the python code to see if the performance differs.

simon-rob avatar May 21 '18 12:05 simon-rob

@simon-rob Thank you for your excellent advice! I know how to switch between cpu and gpu from you . I spend 0.1 seconds to detecting a image by gpu, 0.3 seconds by cpu. How many seconds take you to detect a image by cpu or gpu? And I have another question, the first line of your python script, the net_file is pelee.prototxt, do you mean the pelee.prototxt is the deploy.prototxt?

MrWhiteHomeman avatar May 22 '18 07:05 MrWhiteHomeman

@MrWhiteHomeman

I haven't bench-marked the speed yet as I am not interested in using PC GPU speed. I am interested in CPU/GPU inference on mobile/embedded. But I have got 45-50ms on snapdragon 820 for mobileNet-SSD v1 so I am hoping PeeleNet will be about the same or faster.

So 0.1 or 100ms seems a bit slow, but it depends on how/when you are measuring the speed to/from. I normally measure just the inference time and not the image load or pre-processing as that is the same for whatever network and will vary with CPU type and original image size.

As for pelee.prototxt, yes it is the same as deploty.prototxt - deploy.prototxt is too generic and I get confused to easily!

simon-rob avatar May 22 '18 08:05 simon-rob

@simon-rob i notice when testing, img = img * 0.017. can you explain why?

shudct avatar May 23 '18 03:05 shudct

@shudct

That code is normalising the inputs in the same way that the author trained the network.

See https://www.coursera.org/learn/deep-neural-network/lecture/lXv6U/normalizing-inputs for a mathematical explaination.

Have you tried taking the code out to see what happens?

simon-rob avatar May 23 '18 07:05 simon-rob

@simon-rob I have tested without img = img * 0.017, the result is completely wrong. Usually, the code to normalize the input is by 1/255. I'm confused to the exact meaning of 0.017.

shudct avatar Jun 21 '18 02:06 shudct

It is scaling the input as described in the video with the same scaling that Robert used during the training. Have a look at the scale parameter in train_merged.prototxt:

 transform_param {

    scale: 0.0170000009239

    mirror: true

    mean_value: 103.940002441

    mean_value: 116.779998779

    mean_value: 123.680000305

    resize_param {

      prob: 1.0

      resize_mode: WARP

      height: 304

      width: 304

      interp_mode: LINEAR

      interp_mode: AREA

      interp_mode: NEAREST

      interp_mode: CUBIC

      interp_mode: LANCZOS4

    }

simon-rob avatar Jun 21 '18 08:06 simon-rob