face-mask-detection
face-mask-detection copied to clipboard
deepstream detect nothing
Hi, I want deploy the model to deepstream. I have evaluated the training model, and get the result of below
class name average precision (in %)
------------ --------------------------
mask 87.3164
no-mask 79.17
And I test image with tlt-infer
tool, the result is acceptable though some mask are not detected.
However when I deploy it in deepstream, it cant detect anything.
First I use tlt-export
tool get the etlt model named "model-48500.etlt", and copy it to deepsteam config path.
config_infer_primary_masknet_gpu.txt
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=YjlxOTRkaHRjYWI2Z2NxN2cwOXBlZjh1OTQ6ZTE2YjdkNzctMmQ0OS00MDZhLTgzMGMtNjc5ZTIyZGNkNzA1
tlt-encoded-model=model-48500.etlt
labelfile-path=labels_masknet.txt
# GPU Engine File
model-engine-file=model-48500.etlt_b1_gpu0_fp16.engine
# DLA Engine File
# model-engine-file=/home/nvidia/detectnet_v2_models/detectnet_4K-fddb-12/resnet18_RGB960_detector_fddb_12_int8.etlt_b1_dla0_int8.engine
input-dims=3;960;544;0
uff-input-blob-name=input_1
batch-size=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
#int8-calib-file=/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/detectnet_v2_models/detectnet_4K-fddb-12/calibration.bin
num-detected-classes=2
cluster-mode=1
interval=0
gie-unique-id=1
is-classifier=0
classifier-threshold=0.9
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
deepstream_app_source1_camera_masker_gpu.txt
[primary-gie]
enable=1
gpu-id=0
# Modify as necessary
# GPU engine file
model-engine-file=model-48500.etlt_b1_gpu0_fp16.engine
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=0;1;0;1
bbox-border-color1=1;0;0;1
#bbox-border-color2=0;0;1;1 # Blue
#bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=config_infer_primary_masknet_gpu.txt
Can you help me check it? Thanks!
Good to hear your accuracy is close to what we have got.
In config_infer_primary_masknet_gpu.txt
- reduce
classifier-threshold
parameter, in some case I had tried till 0.6 - add
[class-attrs-0]
and[class-attrs-1]
under this experiment with following three parameters:pre-cluster-threshold
,group-threshold
andeps
. You can find more about these parameters on deepstream dev guide under table titled:Gst-nvinfer plugin, [class-attrs-...] groups, supported keys
Hello,
I have similar issue. Unable to detect faces and masks. Here is the model evaluation results
Validation cost: 0.001500 Mean average_precision (in %): 85.0774
class name average precision (in %)
mask 84.7703 no-mask 85.3846
Config file -
[property] gpu-id=0 net-scale-factor=0.0039215697906911373 tlt-model-key=tlt_encode tlt-encoded-model=/opt/nvidia/deepstream/deepstream-5.0/samples/mask-detection/resnet18_detector_unpruned.etlt labelfile-path=labels_masknet.txt
GPU Engine File
#model-engine-file=/opt/nvidia/deepstream/deepstream-5.0/samples/mask-detection/resnet18_detector_unpruned.engine input-dims=3;960;544;0 uff-input-blob-name=input_1 batch-size=1 model-color-format=0
0=FP32, 1=INT8, 2=FP16 mode
network-mode=0 #int8-calib-file=/mnt/8c3f68c9-a08a-400b-8c80-99c5fee26a06/detectnet_v2_models/detectnet_4K-fddb-12/calibration.bin num-detected-classes=2 cluster-mode=1 interval=0 gie-unique-id=1 is-classifier=0 classifier-threshold=0.5 output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
[class-attrs-0] pre-cluster-threshold=0.3 group-threshold=1 eps=0.5 #minBoxes=1 detected-min-w=0 detected-min-h=0 detected-max-w=0 detected-max-h=0
[class-attrs-1] pre-cluster-threshold=0.3 group-threshold=1 eps=0.3 #minBoxes=1 detected-min-w=0 detected-min-h=0 detected-max-w=0 detected-max-h=0
Can you please share model file and sample video for testing?
@ak-nv Hi , I change classifier-threshold
from 0.1 to 0.9, there is no difference.
I have [class-attrs-0] and [class-attrs-1] group with default value in config_infer_primary_masknet_gpu.txt.
Any other advice about this problem?
Thanks!
I usually try these lines and it works. Have you tried camera mode?
Unfortunately, i cannot share pre-trained model for face-mask-detection
Good to hear your accuracy is close to what we have got.
In
config_infer_primary_masknet_gpu.txt
- reduce
classifier-threshold
parameter, in some case I had tried till 0.6- add
[class-attrs-0]
and[class-attrs-1]
under this experiment with following three parameters:pre-cluster-threshold
,group-threshold
andeps
. You can find more about these parameters on deepstream dev guide under table titled:Gst-nvinfer plugin, [class-attrs-...] groups, supported keys
HI,
I got results when tested using unpruned model
!tlt-infer detectnet_v2 -e $SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
-o $USER_EXPERIMENT_DIR/test/output
-i $USER_EXPERIMENT_DIR/test/images
-k $KEY
However when exported in to etlt model and tested with deepstream it is not working.
Any suggestions ?
Thanks
Do you get any errors? Did you try above suggested solution? Are you trying camera or video deepstream config file?
Good to hear your accuracy is close to what we have got.
In
config_infer_primary_masknet_gpu.txt
- reduce
classifier-threshold
parameter, in some case I had tried till 0.6- add
[class-attrs-0]
and[class-attrs-1]
under this experiment with following three parameters:pre-cluster-threshold
,group-threshold
andeps
. You can find more about these parameters on deepstream dev guide under table titled:Gst-nvinfer plugin, [class-attrs-...] groups, supported keys
@ak-nv I have tested camera and video which generated from dataset picture, both of them cant detecte object.
I have made following changes and able to detect masks
[property] gpu-id=0 net-scale-factor=0.0039215697906911373 tlt-model-key=tlt_encode tlt-encoded-model=/opt/nvidia/deepstream/deepstream/samples/mask-detection/resnet18_detector_unpruned.etlt labelfile-path=/opt/nvidia/deepstream/deepstream/samples/mask-detection/labels_masknet.txt input-dims=3;544;960;0 uff-input-blob-name=input_1 batch-size=1 process-mode=1 model-color-format=0
0=FP32, 1=INT8, 2=FP16 mode
network-mode=0 num-detected-classes=3 cluster-mode=1 interval=0 gie-unique-id=1 output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
[class-attrs-all] pre-cluster-threshold=0.4
Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.7 minBoxes=1
@sudhirm4
Good to hear.
Can we conclude that, experimentation with
[class-attrs-all]
and eps=0.7
minBoxes=1
worked?
@sudhirm4 @ak-nv Hi, I change my config file same as you, It can detect some mask, but most of them are not detected.
hello, any update ?
@XiaoPengZong You might want to experiment with above parameters. Also, I am trying to add more detailed steps and examples with sample video soon in upcoming weeks, if that helps.
I am experiencing the same detection issue when using the trained (unpruned and pruned) model with deepstream. I have trained the model and got about 83% accuracy for both classes. So far I have tried with the unpruned and pruned models in FP16 and FP32 mode. I have also tried increasing the epochs from 120 to 200 and retrained the network more, but no changes. Unless I reduce the detection confidence (aka classifier-threshold) and pre-cluster-threshold to a ridiculously low value (0.01) or 1% it does not detect anything. Unfortunately, at 0.01 it detects almost everything and still manages to miss more than half of the faces! The performance is terrible compared to the purpose-built PeopleNet model (which is also a ResNet18). @ak-nv Could I possibly use the PeopleNet (trained) https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet as the pre-trained model to get better detections or that one is probably trained on too many unmasked faces and this won't work? Or is it possible at all to use an etlt model as my pre-trained model and train that further for mask detection purposes? One thing that I kind noticed is that it is better at detecting large faces that are closer to the camera, that's just my initial observation and not too sure about it, didn't try it with a lot of different videos. I'm still running further experiments and trying to figure out why the trained model has such poor detection, I'll report back my findings, if you have any suggestions please let me know.
One thing that I kind noticed is that it is better at detecting large faces that are closer to the camera, that's just my initial observation and not too sure about it, didn't try it with a lot of different videos.
The dataset contains most of the images with close faces, so your observation is right. You can add augmentations to improve this. TLT_agumentation
Could I possibly use the PeopleNet (trained) https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet as the pre-trained model to get better detections
I have not tried this, but this might be a good way.
@XiaoPengZong Can you add maintain-aspect-ratio = 1
in config_infer_primary_masknet_gpu.txt ? and try it. Please let me know.
@ak-nv Sorry I'm on holiday in last few days. I have a try but not get promotion.
@XiaoPengZong any success?
So after spending a day, the conclusion is to change the input dimensions. Make input-dims=3;480;640;0 or even input-dims=3;544;960;0 does the justice. I was getting 12 FPS on 544,960 but Then I switched to 3;300;300 and got 25FPS.
Things to change input-dims=3;300;300 pre-cluster-threshold=0.2 eps=0.3 or eps=0.4 minBoxes=1
@XiaoPengZong Can you add
maintain-aspect-ratio = 1
in config_infer_primary_masknet_gpu.txt ? and try it. Please let me know.
This hint finally made it work in my case, thanks!