caffe
caffe copied to clipboard
how about the speed of the ssd16(FP16 version) running on TX1
how about the speed of the ssd16(FP16 version) running on TX1 ?
It didn't accelerate much when I tried last year. It also causes some problems(slow down) when doing the post-processing part(Non-maximum Suppression). I am reworking SSD in PyTorch now. It should be much easier to run FP16 mode. I don't have the new number in TX1 now but can provide some numbers I run on V100 GPU which contains special hardware for FP16 computation.
The following numbers mean FPS and don't contain NMS part.
@chengyangfu Thank you for your reply and your detailed data. Maybe I should wait for NVIDIA's TensorRT which is said to be speedup 2X on TX1. I am running SSD on TX2 now, with some light networks it can reach a good performance (about 30fps)
@birdwcp @chengyangfu HI
There seem to be two ways of calculating FPS.
1.via the 'time' model of caffe with batch_size = 1:
build/tools/caffe time -model=models/VGGNet/VOC0712/refinedet_vgg16_320x320/deploy.prototxt -gpu=0
2.get the time interval and estimate the FPS. for example, got the following lines after I run examples/ssd/score_ssd_pascal.py:
I0713 20:50:25.756376 117599 net.cpp:684] Ignoring source layer mbox_loss
I0713 20:51:32.472930 117599 solver.cpp:531] Test net output #0: detection_eval = 0.723217
I0713 20:51:32.473057 117599 solver.cpp:325] Optimization Done.
The total time spent on evaluating 4952 test images is about 67 seconds, and thus FPS = 4952 / 67 = 74.
Which method should be used to calculate FPS? Is there any other way?