FaceBoxes-tensorflow icon indicating copy to clipboard operation
FaceBoxes-tensorflow copied to clipboard

Is it applicable to add landmark detection in this network

Open tenggyut opened this issue 6 years ago • 6 comments

According to the paper, Faceboxes seems a good replacement of MTCNN in Face Detection Area. But MTCNN has a built in landmark detection, I wonder it is applicable to change faceboxes into a joint multi task network just like MTCNN?

Also Any idea about filling the performance gap between this implementation and the caffe one?

Thanks

tenggyut avatar Jul 05 '18 05:07 tenggyut

I was try to combine this model with onet in MTCNN to detect face and landmarks, it works well.

tirtile avatar Jul 13 '18 13:07 tirtile

How to combine onet with faceboxes?use faceboxes's prediction as onet's input?

tenggyut avatar Jul 14 '18 02:07 tenggyut

Yes. But, change it to a multi task network and retrain it may be better.

tirtile avatar Jul 14 '18 03:07 tirtile

But the feature map generated by faceboxes is not reused, so may hurt the runtime efficiency?

Also, did you reproduce the performance described in the original paper?

tenggyut avatar Jul 14 '18 03:07 tenggyut

Yep. No, I haven't retrained yet.

tirtile avatar Jul 14 '18 04:07 tirtile

Hi. It is a good idea to use onet with FaceBoxes to detect facial landmarks.

But you could also train a simple keypoint detector by yourself. Here is an example of training a simple and fast (~0.5 ms on GTX 1080) 5-keypoints detector: https://github.com/TropComplique/wing-loss (it is not completely finished yet). It is an implementation of this: https://arxiv.org/abs/1711.06753.

I believe that it will be hard to train FaceBoxes for keypoint prediction using multitask loss. Because we will need a lot of training data:
images with a lot of face bounding boxes + keypoints for each face.
But we only have data like this:
images with only one face and keypoints for it. For example, CelebA dataset. And this: images with a lot of face bounding boxes only. For example, WIDER dataset.

And I believe onet is trained on face crops only. I mean, it sees only close face regions during training.

TropComplique avatar Jul 20 '18 20:07 TropComplique