insightface
insightface copied to clipboard
SCRFD issues
Hi! I'm testing your new SCRFD face detector and have noticed some issues with onnx inference code and network outputs:
-
In scrfd.py line 275 you are filtering
bboxes
, but later at line 278 you returndet
, somax_num
parameter have no effect and may cause exceptions. -
Later at line 335 you are calling detector without providing input shape, which wont work with model having dynamic shape. However it won't be an issue when called from
face_analysis.py
-
I have noticed that detector returns very low scores or even fails on faces occupying >40% of image, it's especially visible for square shaped images, when there can't be provided additional padding during resize process. Also I have noticed that in such cases accuracy increases when lowering detection size (i.e. 480x480), and decreases when increasing it (i.e 1024x1024). Here is an example of detection at 640x640 scale:
Original image size is 1200x1200. As you can see when detection is run with resize to 640x640 score is 0.38 For
480x480
score is0.86
, and for736x736
score is0.07
. Same behavior is noticed for bothscrfd_10g_bnkps
andscrfd_2.5g_bnkps
models. In some cases it might be fixed by adding fixed padding around image, but it might lead to decreased accuracy for other image types, so it can't be applied by default.
BTW: Thanks for your great work!
Hi @SthPhoenix, thanks for your attention.
- Just fixed.
- For models that support dynamic input, you should pass the
input_size
param for detect() method. - It actually depends on the anchor design. But generally, it's good for 640 input size. You can try more pictures.
BTW: if you're in a single-face situation, input size of 384/256(even 128) without padding is recommended.
- That's great! Thanks!
- Yes, I've just meant that example code will throw exception out of box, though easily fixable.
- New detector works just great for other image types, beside the ones with large faces.
BTW: if you're in a single-face situation, input size of 384/256(even 128) without padding is recommended.
I'm developing face recognition REST API based on InsightFace models and TensorRT inference backend. The problem comes with unconstrained heterogeneous input, like random photo galleries or images provided by users, where optimal settings can't be set in advance. Original Retinaface detector works great in such scenario, but it has a bit lower accuracy and much lower speed than your new SCRFD detector.
It looks like that new detector was mostly optimized for small faces and is a bit undertrained for large faces. Can it be somehow fixed during training or is it a design flaw?
My suggestion( if you want to use the pre-trained models ): Option 1: Use 512 input size Option 2: Use a combination of 640 and 256 inputs with some engineering tricks, which is only 16% more flops than the single 640 input.
Thanks! I was investigating these options yesterday, option 2 is more promising but more logically complicated. In a long term retraining seems a better solution, could you give any hints on what parameters could be tuned?
Little update: this bug seems to be related only to *_bnkps
models. When using models without key points detection work as expected
@nttstar , I have retrained scrfd_2.5g_bnkps
, with batch norm replaced with group norm (new model should be called scrfd_2.5g_gnkps
i think), just like for scrfd_2.5g
model, I achieved following WiderFace AP:
Model | Easy | Medium | Hard |
---|---|---|---|
scrfd_2.5g_bnkps | 93.80 | 92.02 | 77.13 |
scrfd_2.5g_gnkps | 93.57 | 91.70 | 76.08 |
As you can see this config gives a small accuracy decrease, while completely solves problem with large faces. For above example I'm getting score around 0.7
@SthPhoenix Thanks! Did you make the feature maps shared? BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.
@SthPhoenix Thanks! Did you make the feature maps shared?
No, shared feature maps seems to reduce accuracy more noticably.
BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.
I'm training scrfd_10g_gnkps
right now, both models will be included in InsightFace-REST repo, though it would be great if you mention it )
Also I can make a pull request with updated configs for 0.5, 2.5, 10 and 34g models and dockerfile for training scrfd in docker.
@SthPhoenix So you still using widerface, only change the config?
@SthPhoenix So you still using widerface, only change the config?
Yes, just it.
@SthPhoenix Thanks! Did you make the feature maps shared?
No, shared feature maps seems to reduce accuracy more noticably.
BTW, you can open a new repo to place this new model so that I can give a link to it, if you want.
I'm training
scrfd_10g_gnkps
right now, both models will be included in InsightFace-REST repo, though it would be great if you mention it ) Also I can make a pull request with updated configs for 0.5, 2.5, 10 and 34g models and dockerfile for training scrfd in docker.
No, shared feature maps seems to reduce accuracy more noticably.
Have you tested it? How about the mAP?
I have tested it by modifying scrfd_500m.py
(as it could be faster trained) as follows:
norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=True,
After 640 epoch I've got following mAP:
0.881, 0.851, 0.619
Which is much lower than results you reported for same model without KPS, and even newer scrfd_500m_bnkps
you published yesterday.
So I have trained scrfd_2.5g_gnkps
model and training scrfd_10g_gnkps
model now with following config:
norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=False,
BTW, scrfd_500m_gnkps
model had no improvement on large faces, though I'm not sure if it's connected to strides_share=True
, I'll try retraining this model and checking again.
Shared feature map should be better by using GN, from my experiments of resnet based backbone.
Shared feature map should be better by using GN, from my experiments of resnet based backbone.
Hmmm, I'll check it on other models, thanks!
I have released retrained models at my repo. Models accuracy on WiderFace benchmark:
Model | Easy | Medium | Hard |
---|---|---|---|
scrfd_10g_gnkps | 95.51 | 94.12 | 82.14 |
scrfd_2.5g_gnkps | 93.57 | 91.70 | 76.08 |
scrfd_500m_gnkps | 88.70 | 86.11 | 63.57 |
All models were trained with following settings:
norm_cfg=dict(type='GN', num_groups=16, requires_grad=True),
cls_reg_share=True,
strides_share=False,
Model scrfd_10g_gnkps
was trained up to 720 epoch, for some reason it gives best results at this checkpoint, though all other models begins to degrade after 640 epoch.
@nttstar , I have retrained
scrfd_2.5g_bnkps
, with batch norm replaced with group norm (new model should be calledscrfd_2.5g_gnkps
i think), just like forscrfd_2.5g
model, I achieved following WiderFace AP:Model Easy Medium Hard scrfd_2.5g_bnkps 93.80 92.02 77.13 scrfd_2.5g_gnkps 93.57 91.70 76.08 As you can see this config gives a small accuracy decrease, while completely solves problem with large faces. For above example I'm getting score around 0.7
Hi,I just run the bash CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4
and only achieve 62.4ap. Did anything I miss?
Hi @czzbb ! My config was based on scrfd_2.5g_bnkps.py
modified according to my previous post.
Hi @czzbb ! My config was based on
scrfd_2.5g_bnkps.py
modified according to my previous post.
Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4
Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run
CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4
If you are using original configs you should get mAP close to published values without issues. Have you tested mAP using evaluation script or this is mAP logged during training? You should refer to mAP outputted by evaluation script.
Hi, I can't reproduce the official results(I just got 62.4, while it should be 77 ap). So I wonder is there anything I ignore. I download the datasets and annoations, and then directly run
CUDA_VISIBLE_DEVICES="0,1,2,3" PORT=29701 bash ./tools/dist_train.sh ./configs/scrfd/scrfd_2.5g.py 4
If you are using original configs you should get mAP close to published values without issues. Have you tested mAP using evaluation script or this is mAP logged during training? You should refer to mAP outputted by evaluation script.
Thanks a lot! I got this 62.4 ap during training. But I get 77 ap using the evaluation script. Can't imagine the difference could be so huge.
@nttstar @SthPhoenix Hi can you explain why GN works for big face? I retrain scrfd_10g_bnkps with specific big face(occupying >80% of image) augumentation and add big face ~20% in each batch, the model can handle big face detect, then i retrain scrfd_34g_bnkps, it works bad ,all big faces not detected. Then i change to scrfd_34g_gnkps, it works, but GN not supported by tensorRT(@SthPhoenix can you tell me how to transform GN onnx models to RT models?)
Then i change to scrfd_34g_gnkps, it works, but GN not supported by tensorRT(@SthPhoenix can you tell me how to transform GN onnx models to RT models?)
In TensorRT you should load optional layers support, for Python TRT API you should just add this line just after import:
import tensorrt as trt
trt.init_libnvinfer_plugins(None, "")
Hi, @tuoyuxiang @SthPhoenix
I followed the guide above, but the model performance cannot be reproduced. The model performed well on widerface validation dataset, but it performed poorly on images with large faces.
Did you train with the following configuration?
norm_cfg=dict(type='GN', num_groups=16, requires_grad=True), cls_reg_share=True, strides_share=False,
Did you train the model without any other additional methods? Then, the trained model perform well on large faces? How many gpu did you use? The total batch size will vary depending on the number of gpu, could this also affect the performance? I used 8 gpus for training.
Hi, @tuoyuxiang @SthPhoenix
I followed the guide above, but the model performance cannot be reproduced. The model performed well on widerface validation dataset, but it performed poorly on images with large faces.
Did you train with the following configuration?
norm_cfg=dict(type='GN', num_groups=16, requires_grad=True), cls_reg_share=True, strides_share=False,
Did you train the model without any other additional methods? Then, the trained model perform well on large faces? How many gpu did you use? The total batch size will vary depending on the number of gpu, could this also affect the performance? I used 8 gpus for training.
I use this to improve large faces, you can try it: https://modelscope.cn/models/damo/cv_resnet_facedetection_scrfd10gkps/summary
Hi, there.
I found that using default scales augmentation (range = [0.3, 0.45, ..., 2.0]) for training process could make tons of tiny faces be negative samples by using ATSS, but in widerface evaluation, these tiny faces are important on hard protocol evaluation.

Base on ATSS algorithm, one step is to select anchors inside gt boxes. These tiny faces could not be included by selected anchors or have little anchors to be predicted.

Does anyone have this question and how do you explain and solve it?
By the way, I solve large face problem by replacing SGD
optimizer with AdamW
.
dockerfile for training scrfd in docker.
@SthPhoenix Can you share your docker file with me for training scrfd in docker?
I made one docker file, but in the training process in docker and in specific epochs, the training stopped with an error.
Hi, there.
I found that using default scales augmentation (range = [0.3, 0.45, ..., 2.0]) for training process could make tons of tiny faces be negative samples by using ATSS, but in widerface evaluation, these tiny faces are important on hard protocol evaluation.
![]()
Base on ATSS algorithm, one step is to select anchors inside gt boxes. These tiny faces could not be included by selected anchors or have little anchors to be predicted.
![]()
Does anyone have this question and how do you explain and solve it?
By the way, I solve large face problem by replacing
SGD
optimizer withAdamW
.
I read SCRFD paper, the authors said faces smaller than 4 x 4 pixel would be dropped, but the source code is not added this constrain? is it right?