ssd_keras icon indicating copy to clipboard operation
ssd_keras copied to clipboard

Slow detection

Open MrXu opened this issue 8 years ago • 5 comments

Great work! Thanks a lot!

The detection takes around 2 second per image on a mac using only CPU. It's quite different from the performance of test provided in the paper. Apart from hardware, is it possible that it's caused by the overhead of Keras? Also, may I ask is it possible to shrink the network somehow? Thank you.

MrXu avatar Nov 11 '16 15:11 MrXu

The performance of inference phase in this paper is conducted using NVIDIA K40 GPUs, and the input is a batch of images. You can replace the vgg module with AlexNet, AlexNet is smaller than vgg.

xindongzhang avatar Nov 12 '16 09:11 xindongzhang

@xindongzhang thanks for your comment, but I believe, that the authors state the following: We measure the speed with batch size 8 using Titan X and cuDNN v4 with Intel Xeon [email protected]. However, it doesn't matter, they report performance on GPU.

@MrXu, I measured forward pass of my PC with Titan and for 5 pictures (like in SSD.ipynb) I got the results that are in the screenshot. screen shot 2016-11-12 at 12 10 18 pm This means that it takes around 50 ms per image to get the prediction. I haven't measured the original caffe code, but I'm sure that my NMS implementation is slower, than the original one. Moreover, some custom layers can also be not very efficient, this is the thing to improve in the future, because I also need real-time performance on GPU for my problem. Any ideas how to speed up the code are welcome! I've also heard that sometimes Keras is slower, than other frameworks, but I can't bear Caffe, so, for me Keras is the best choice.

As for network shrinkage, apart from replacing vgg with AlexNet (after this step you will have to retrain the net), you can think about scales of your detection. For example, if you know, that you won't have big objects on your images, you, probably, don't need final layers and can delete them.

rykov8 avatar Nov 12 '16 09:11 rykov8

@xindongzhang thanks for the suggestion. I may prefer to avoid retrain the model. @rykov8 , thanks for the clarification. I do read that Keras is slower than other framework like TensorLayer or TFlearn. I am trying to run the prediction on Rpi, seems achieving real-time detection with only CPU is really hard...

MrXu avatar Nov 13 '16 03:11 MrXu

@MrXu as for training, I'm working on this part, hope to release the code this week. I also had to change some things in the architecture in order to be able to train the net. However, I will test it only for my problem, but I try to implement training as universal, as possible. Hope, it will be useful. As for real-time detection, I'm quite sure, that unfortunately, it is nearly impossible nowadays to run deep nets on CPU with real-time performance. If you need real-time on CPU you might consider simpler methods with loss of quality.

rykov8 avatar Nov 15 '16 20:11 rykov8

@rykov8 , Thanks for the code!. It works perfectly. I wanted to know if you tried out anything to improve the fps for real time detection. I have been trying to implement multithreading, but no luck so far.

ManjeeraJagiri avatar Jan 22 '18 05:01 ManjeeraJagiri