Pelee
Pelee copied to clipboard
Bounding box sizes too large
Robert many thanks for your great work!
I am having trouble understanding why I am getting larger than expected bounding boxes for Pelee detections.
The heights and widths are not as closely cropped when compared to mobilenet-SSD implementations. I have read that you trained the model with pytorch, could the conv padding be a problem? Or is there something else I have missed?
Many Thanks,
Simon
I am using the following python script to for my test:
net_file= 'pelee.prototxt'
caffe_model='pelee_304x304_acc7637.caffemodel'
test_dir = "images"
if not os.path.exists(caffe_model):
print("caffemodel does not exist")
exit()
net = caffe.Net(net_file,caffe_model,caffe.TEST)
CLASSES = ('background',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
def preprocess(src):
img = cv2.resize(src, (304,304))
img_mean = np.array([103.94, 116.78, 123.68], dtype=np.float32)
img = img.astype(np.float32, copy=True) - img_mean
img = img * 0.017
return img
def postprocess(img, out):
h = img.shape[0]
w = img.shape[1]
box = out['detection_out'][0,0,:,3:7] * np.array([w, h, w, h])
cls = out['detection_out'][0,0,:,1]
conf = out['detection_out'][0,0,:,2]
return (box.astype(np.int32), conf, cls)
def detect(imgfile, thresh):
origimg = cv2.imread(imgfile)
img = preprocess(origimg)
img = img.astype(np.float32)
img = img.transpose((2, 0, 1))
net.blobs['data'].data[...] = img
out = net.forward()
box, conf, cls = postprocess(origimg, out)
for i in range(len(box)):
if conf[i] > thresh :
p1 = (box[i][0], box[i][1])
p2 = (box[i][2], box[i][3])
cv2.rectangle(origimg, p1, p2, (0,255,0))
p3 = (max(p1[0], 15), max(p1[1], 15))
title = "%s:%.2f" % (CLASSES[int(cls[i])], conf[i])
cv2.putText(origimg, title, p3, cv2.FONT_ITALIC, 0.6, (0, 255, 0), 1)
cv2.imshow("Pelee", origimg)
k = cv2.waitKey(0) & 0xff
if k == 27 : return False
return True
for f in os.listdir(test_dir):
if detect(test_dir + "/" + f, 0.2) == False:
break
I have just downloaded your latest prototxt and it's fixed!
@simon-rob , Thank you for your python script, your python script use cpu or gpu to test a picture? I use the python script to test a image in 0.3 seconds, so I don't know cpu or gpu I used. Can you give me some advice?
@MrWhiteHomeman,
It depends if you successfully compiled the GPU version of Caffe and you didn't disable the GPU by uncommenting CPU_ONLY := 1 in the Makefile.config
If you do have a GPU version installed, you can switch between CPU and GPU by using:
caffe.set_mode_gpu() or caffe.set_mode_cpu()
Otherwise it should default to using the GPU.
You could try putting caffe.set_mode_cpu() in the python code to see if the performance differs.
@simon-rob Thank you for your excellent advice! I know how to switch between cpu and gpu from you . I spend 0.1 seconds to detecting a image by gpu, 0.3 seconds by cpu. How many seconds take you to detect a image by cpu or gpu? And I have another question, the first line of your python script, the net_file is pelee.prototxt, do you mean the pelee.prototxt is the deploy.prototxt?
@MrWhiteHomeman
I haven't bench-marked the speed yet as I am not interested in using PC GPU speed. I am interested in CPU/GPU inference on mobile/embedded. But I have got 45-50ms on snapdragon 820 for mobileNet-SSD v1 so I am hoping PeeleNet will be about the same or faster.
So 0.1 or 100ms seems a bit slow, but it depends on how/when you are measuring the speed to/from. I normally measure just the inference time and not the image load or pre-processing as that is the same for whatever network and will vary with CPU type and original image size.
As for pelee.prototxt, yes it is the same as deploty.prototxt - deploy.prototxt is too generic and I get confused to easily!
@simon-rob i notice when testing, img = img * 0.017. can you explain why?
@shudct
That code is normalising the inputs in the same way that the author trained the network.
See https://www.coursera.org/learn/deep-neural-network/lecture/lXv6U/normalizing-inputs for a mathematical explaination.
Have you tried taking the code out to see what happens?
@simon-rob I have tested without img = img * 0.017, the result is completely wrong. Usually, the code to normalize the input is by 1/255. I'm confused to the exact meaning of 0.017.
It is scaling the input as described in the video with the same scaling that Robert used during the training. Have a look at the scale parameter in train_merged.prototxt:
transform_param {
scale: 0.0170000009239
mirror: true
mean_value: 103.940002441
mean_value: 116.779998779
mean_value: 123.680000305
resize_param {
prob: 1.0
resize_mode: WARP
height: 304
width: 304
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}