FaceBoxes icon indicating copy to clipboard operation
FaceBoxes copied to clipboard

Performance compare to S3FD

Open shaomang opened this issue 6 years ago • 9 comments

Thanks for sharing another awesome work!

I'm wondering if you have compared this new work with the previous S3FD under similar computational cost. From my own test, I tried S3FD (320x180) vs FaceBoxes (960x540), both have similar speed on my device (not using CPU though), but S3FD still performs better on my test images.

Is this expected or there's any reason behind?

shaomang avatar Jan 11 '19 01:01 shaomang

@shaomang What device do you use? You can compare their speed on the CPU.

sfzhang15 avatar Jan 11 '19 02:01 sfzhang15

@sfzhang15 Thanks for the reply, I tested on GPU and Movidius NCS. Since from the report on GPU speed, SFD achieves 36 FPS and Faceboxes has 125 FPS. By downsizing SFD [3,3], I assume they should have comparable speed. I will check on CPU to confirm.

shaomang avatar Jan 11 '19 03:01 shaomang

@shaomang We did not compare SFD and FaceBoxes about accuracy v.s speed on GPU, your experiment is interesting. How about the comparison on CPU?

sfzhang15 avatar Jan 13 '19 16:01 sfzhang15

@sfzhang15 From my test on CPU, SFD (90ms avg.) outperforms Facebox (120ms avg.) in speed under similar detection precision (qualitatively), might need to run a full evaluation under specific input size to check.

shaomang avatar Jan 14 '19 03:01 shaomang

@shaomang We will compare SFD and FaceBoxes about accuracy v.s speed on FDDB using the reported CPU in the paper. We will let you know when we have the results.

sfzhang15 avatar Jan 17 '19 08:01 sfzhang15

@shaomang If CPU 120ms, why it called CPU realtime? more than 100ms should be slow and not realtime at all

lucasjinreal avatar Jan 30 '19 02:01 lucasjinreal

@jinfagang It runs 120ms under input size of 960x540. It will run realtime for smaller size, at the sacrifice of minimum detectable face size.

shaomang avatar Jan 31 '19 16:01 shaomang

@shaomang 120ms for input size 960x540. Can you tell me which CPU you ran this on ?

If we are talking about real time on camera with 25FPS. 1000ms/25 = 40ms

Anything under than 40ms should be considered real time. And i am not counting the other preprocessing and post processing that you would be doing on the camera stream.

dexception avatar Jul 02 '19 09:07 dexception

With CUDA, 1920x1080 resolution input without any downscaling (min face size is ~ 14 px), Faceboxes performance can be approximated as: FPS = CUDA CORES / 16

It scales linearly with resolution.

I find one of the benefits of faceboxes is a low false positive ratio when the NMS is set up correctly (that is, not set up how it is in the paper). Comparing recall by itself is not a useful metric as you will find algorithms like RetinaFace have accuracy of 50% for their recall. I compare recall at accuracy = 95% and faceboxes outperforms s3fd and retinanet at same speed.

I haven't been interested in CPU performance because real-time is very misleading if you have to use VGA resolution.

xsacha avatar Jul 18 '19 23:07 xsacha