mxnet-ssd
mxnet-ssd copied to clipboard
The detect speed is slow!
I the the demo with mobilenet512 and resnet50_512 model, but it is very slow, it cost 1.8480 seconds and 4.9594 seconds respectively? Is this normal?
I test with CPU
Is the forward speed relative to the trainning convergence?
I used mxnet PredictorHandle
(https://mxnet.incubator.apache.org/doxygen/c__predict__api_8h.html) to do the test. And calculated the forward time, And I got the experiments result below:
Right now I am trying to reduce the loss. On the other hand, I wonder if there is a relationship between trainning loss and forward time cost? Thank you. @zhreshold
@yanhn CNN forward time is not affected by convergence. The time difference resides in NMS operation. Good model returns a lot of background regions which have very low scores, and is excluded by NMS.
Thank you. It helped. I checked the output num of multiboxdetection layer. And found the bounding box number range from 160(good model) to ~3500(bad model). I set the threshold
paramter to 0.1 and it ran a lot faster(~35ms -> ~20ms).
@yanhn @zhreshold where to change the threshold, and you test with GPU or CPU.
Test with GPU,
I changed the parameter in xx-symbol.json
by adding "threshold": "0.1",
{
"op": "_contrib_MultiBoxDetection",
"name": "detection",
"attrs": {
"force_suppress": "True",
"nms_threshold": "0.5",
"threshold": "0.1",
"nms_topk": "400",
"variances": "(0.1, 0.1, 0.2, 0.2)"
},
"inputs": [[134, 0, 0], [165, 0, 0], [179, 0, 0]]
}
Don't know why but it accelerated my demo speed.
@yanhn @zhreshold I have try nms_thresh from 0.1 to 0.8, but the time almost the same, about 2s.
I have test your method with different from 0.1 to 0.95, but the time is still 2s.
@zhreshold If I could set some parameters to accelerate the predict speed?
@yhl41001 You can set nms threshold to 0 to disable NMS, you will get the network forward time, that's the best you can get. Typically I would suggest using mxnet with MKL build if you are using intel cpu.
@zhreshold I have set --nms 0, but the cost time is still 2s, but the tensorflow offered the mobilenet+ssd model cost just about 50ms with sse. How much could the mkl accelerate than openblas?