yolov7 icon indicating copy to clipboard operation
yolov7 copied to clipboard

Very poor result on object crowd, same as yolov4

Open ggenny opened this issue 1 year ago • 10 comments

I am evaluating the performance on different image ( people crowd ), the detection abilities on people are significatly lower than v3, as an example this image ( using web demo ):

[YOLOV3] ./darknet detector test cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights image

YOLOV7 [ Web Demo, same as yolov4 ] yolov7

Original image: 144037989-efaa1de8-e24d-488f-99c5-f9d8fd69bed5 (1)

is a neck definition problem ?

ggenny avatar Aug 05 '22 02:08 ggenny

It mainly because we use iou as target of objectness, you could reduce self.gr to avoid this issue.

By the way, my testing result of yolov7 is as below. image

WongKinYiu avatar Aug 05 '22 03:08 WongKinYiu

can you draw the box thinner so we can have a quick comparison between yolov3 and v7 @WongKinYiu ? As I can see, yolov7 is still worse than v3 for this case.

trungpham2606 avatar Aug 05 '22 03:08 trungpham2606

Yes, to handle this case, the model need to be retrain by above mentioned suggestion.

WongKinYiu avatar Aug 05 '22 04:08 WongKinYiu

@WongKinYiu should we need to change this parameter from

hyp.scratch.yml

obj: 0.7 # obj loss gain (scale with pixels) obj_pw: 1.0 # obj BCELoss positive_weight iou_t: 0.20 # IoU training threshold

akashAD98 avatar Aug 05 '22 04:08 akashAD98

No, self.gr is set in train.py.

WongKinYiu avatar Aug 05 '22 05:08 WongKinYiu

okay i have another q, sorry its noob question

hyp['cls'] *= nc / 80. * 3. / nl # scale to classes and layers

nc=no of classes nl = model.model[-1].nl # number of detection layers (used for scaling hyp['obj'])

what is 80 mentioned for? (its coco 80 classes) should we need to change this no for custom trained model having different class no ?

akashAD98 avatar Aug 05 '22 05:08 akashAD98

Yes, coco 80 classes. You do not need change it for custom dataset.

WongKinYiu avatar Aug 05 '22 05:08 WongKinYiu

Try training your model on crowd dataset

spacewalk01 avatar Aug 05 '22 06:08 spacewalk01

Yes, coco 80 classes. You do not need change it for custom dataset.

For the custom dataset, I use yolov7 to train the object detect model. What I need to do is just the following 3 steps ?

  1. Create the data/custom.yaml file, eg:
# COCO 2017 dataset http://cocodataset.org - first 128 training images
# Train command: python train.py --data coco128.yaml
# Default dataset location is next to /yolov5:
#   /parent_folder
#     /coco128
#     /yolov5


# download command/URL (optional)
#download: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: /home/epbox/AI/dataset/pf/pddModelByFlash/v0.0.3/images/train/ 
val: /home/epbox/AI/dataset/pf/pddModelByFlash/v0.0.3/images/val/
test: /home/epbox/AI/dataset/pf/pddModelByFlash/v0.0.3/images/val/

# number of classes
nc: 6

# class names
names: ['scratch','crack','leakage','membrane','wiredscreen','blurredscreen']
  1. Change the nc: 80 # number of classes in cfg/training/yolov7.yaml to nc: 6 # number of classes

  2. Choose train mode

  • From scratch with p5
python train.py --workers 8 --device 0 --batch-size 32 --epochs 100 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7_flash_p5 --hyp data/hyp.scratch.p5.yaml
python train.py --workers 8 --device 0 --batch-size 32 --epochs 100 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '/home/epbox/AI/pre_weights/yolov7/yolov7_training.pt' --name yolov7_flash --hyp data/hyp.scratch.custom.yaml

Digital2Slave avatar Aug 05 '22 07:08 Digital2Slave

Just use lower --conf 0.10 or --conf 0.025 and YOLOv7 will still have more Correct predictions and less Wrong predictions than YOLOv2/3/4/5/...

Main issue, you are only looking for True Positives in 1 image, while you don't take into account other million images where there are no people, but YOLOv2/3/4/5/... will predict people, so there will be more False Positives. This is why we measure accuracy on ~20000 MS COCO test-dev images using an AP-metric that considers True/False Positive/Negatives for all possible confidence thresholds.

As @WongKinYiu said, YOLOv7 uses IoU(pred_box, target_box) as target for objectness so loss-function

  • for YOLOv4 obj_loss = nn.BCEWithLogitsLoss(predicted_objectness, 1.0)
  • for YOLOv7 obj_loss = nn.BCEWithLogitsLoss(predicted_objectness, (1.0 - IoU(pred, target)) so it reduces predicted Confidence Score

Therefore there are 2 ways:

  1. Preferably re-train YOLOv7 with model.gr = 1.0 https://github.com/WongKinYiu/yolov7/blob/main/train.py#L288
  2. Or just run detection with 2x, 5x or 10x times lower confidence threshold

I just ran prediction using YOLO v2, v3, v4, v7, v7-e6e, v7(thresh=0.025), v7-e6e(thresh=0.025) And you can see that v7(thresh=0.025), v7-e6e(thresh=0.025) predicts almost all persons (and several backpacks, handbags, cell phones) and have more correct predictions and less wrong predictions than previous versions v2-v4.

For all models I use 640x640 resolution, while for v7-e6e I use 1280x1280.


YOLOv2: darknet.exe detector test data/coco.data cfg/yolov2.cfg yolov2.weights -thresh 0.25 -ext_output crowd.png

yolov2


YOLOv3: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights -thresh 0.25 -ext_output crowd.png

yolov3


YOLOv4: darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights -thresh 0.25 -ext_output crowd.png

yolov4


YOLOv7 conf 0.25: python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --device 0 --source crowd.png --view-img

yolov7


YOLOv7-e6e conf 0.25: python detect.py --weights yolov7.pt --conf 0.25 --img-size 1280--device 0 --source crowd.png --view-img

yolov7_e6e


YOLOv7 conf 0.025: python detect.py --weights yolov7.pt --conf 0.025 --img-size 640 --device 0 --source crowd.png --view-img

yolov7_conf_0_025


YOLOv7-e6e conf 0.025: python detect.py --weights yolov7.pt --conf 0.025 --img-size 1280--device 0 --source crowd.png --view-img

yolov7_e6e_conf_0_025

AlexeyAB avatar Aug 05 '22 22:08 AlexeyAB

It mainly because we use iou as target of objectness, you could reduce self.gr to avoid this issue.

By the way, my testing result of yolov7 is as below. image

Is there a recommended setting value for self.gr ?Thanks.

PearlDzzz avatar Nov 22 '22 11:11 PearlDzzz

same question

It mainly because we use iou as target of objectness, you could reduce self.gr to avoid this issue. By the way, my testing result of yolov7 is as below. image

Is there a recommended setting value for self.gr ?Thanks.

same Q, what is the recommended value for model.gr ? setting the value too high or too low to suite an image would produce poor results on other images

sadimoodi avatar Nov 23 '22 07:11 sadimoodi

Hi @glenn-jocher !

I found an interesting issue and I tested it with YOLOv5. Recent detection models predicting localization accuracy for detection scores have weak performance in case many objects are overlapped.

YOLOv5x conf 0.25 crowd_people

YOLOv5l conf 0.25 crowd_people

YOLOv5m conf 0.25 crowd_people

YOLOv5s conf 0.25 crowd_people

YOLOv5n conf 0.25 crowd_people

YOLOv5x conf 0.05 crowd_people

YOLOv5l conf 0.05 crowd_people

YOLOv5m conf 0.05 crowd_people

YOLOv5s conf 0.05 crowd_people

YOLOv5n conf 0.05 crowd_people

developer0hye avatar Dec 21 '22 00:12 developer0hye