Complex-YOLOv4-Pytorch icon indicating copy to clipboard operation
Complex-YOLOv4-Pytorch copied to clipboard

Did you compare speed and accuracy of Complex-YOLOv4 vs other algorithms on Kitti dataset?

Open AlexeyAB opened this issue 3 years ago • 16 comments

@maudzung Hi, Nice work! Did you compare speed and accuracy of Complex-YOLOv4-Pytorch vs other algorithms on Kitti dataset? Is it still better in accuracy and speed than other competitors?


Also some reference with implementations of CIoU.

Examples:

  • C: https://github.com/AlexeyAB/darknet/blob/a71b9a6e9a009cf94900c53deb344c5204835700/src/box.c#L233-L256

  • Matlab: https://github.com/Zzh-tju/DIoU/blob/master/simulation%20experiment/dCIOU.m

  • Python: https://github.com/VCasecnikovs/Yet-Another-YOLOv4-Pytorch/blob/2e18612e1852abbf35b4dac55a00f2a3b2d814ed/model.py#L527-L561

  • Python: https://github.com/ultralytics/yolov3/blob/eca5b9c1d36e4f73bf2f94e141d864f1c2739e23/utils/utils.py#L262-L282

Desctiption: https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61

1_egnHdTEZcYJKkcPkDTgaVw

AlexeyAB avatar Jul 10 '20 00:07 AlexeyAB

Hi @AlexeyAB ,

Thanks for your comments. I'm trying to improve its performance before writing up a comparison.

Actually, the IoU calculation for polygons is very expensive and different from the IoU calculation for boxes in 2D images (like the COCO dataset) because we need to consider both sizes and rotations of boxes. Hence, I haven't taken advantage of CIoU or GIoU loss for optimization. I'm trying to speed up the IoU calculation in this task.

P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.

maudzung avatar Jul 10 '20 01:07 maudzung

@maudzung

Hence, I haven't taken advantage of CIoU or GIoU loss for optimization.

Did you try CIoU/GIoU for training with 3D-bboxes and it didn't increase accuracy?

I'm trying to speed up the IoU calculation in this task.

Do you try to accelerate IoU claculation, or do you try to improve accuracy?

P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.

Thanks!

AlexeyAB avatar Jul 10 '20 11:07 AlexeyAB

Thank you @AlexeyAB

I haven't used CIoU or GIoU loss yet, I'm trying to apply them to the loss function. I'm also trying to speed up the non-max-suppression step in the inference phase. At this moment, I couldn't vectorize the IoU calculation in the step. So If there are many boxes that have high confidences, the postprocessing speed will be slow.

maudzung avatar Jul 11 '20 07:07 maudzung

I tried to detected rotated faces and came across the same problem of rotated bounding box intersection over union calculation. I think that is will be very difficult to get a derivative of this very complex function. I instead tried to distill the idea from giou and predicted the size and angle instead of width and height and got better results than traditional bounding box prediction. Maybe this could be worth a try for you too: https://www.researchgate.net/publication/335538424_Detecting_Arbitrarily_Rotated_Faces_for_Face_Analysis

fsaxen avatar Jul 16 '20 09:07 fsaxen

Thank you so much @fsaxen

maudzung avatar Jul 17 '20 01:07 maudzung

Hi @AlexeyAB

I have added the implementation of GIoU loss for rotated boxes. I'm running experiments on it to test its performance. Can you please share with me the weights of different components of the total_loss in your implementation? At this time, I have set lgiou_scale = lobj_scale = lcls_scale = 1.

total_loss = loss_giou * lgiou_scale + loss_obj * lobj_scale + loss_cls * lcls_scale

Thank you so much!

maudzung avatar Jul 23 '20 13:07 maudzung

@maudzung Hi,

I use:

lgiou_scale = 0.07
lobj_scale = 1.0
lcls_scale = 1.0

Also you can try

lgiou_scale = 0.05
lobj_scale = 1.0
lcls_scale = 0.6

AlexeyAB avatar Jul 23 '20 13:07 AlexeyAB

Thank you @AlexeyAB for your quick response. I have 1 more question. Did you apply weights noobj_scale and obj_scale for the loss_obj as YOLOv3? Your answer can help me save a ton of time that spends not only on reading your code but also running experiments. I'm looking forward to hearing that from you. Thank you once again!

maudzung avatar Jul 23 '20 14:07 maudzung

What do you mean? I use:

if (truth) { // for object
  delta_bbox[i] = giou_delta[i] * lgiou_scale;
  delta_objectness = (1 - output[obj_index]) * lobj_scale;
  
  for(int k = 0; k < classes; ++k) {
    if(k == truth.class_id)   delta_class_probability[k] = (1 - output[cls_index + k]) * lcls_scale;
    else   delta_class_probability[k] = (0 - output[cls_index + k]) * lcls_scale;
  }
} 
else { // for no object
  delta_objectness = (0 - output[obj_index]) * lobj_scale;
}


AlexeyAB avatar Jul 23 '20 14:07 AlexeyAB

@maudzung Hi, Did you get any results, or do you training it on Kitti?

AlexeyAB avatar Aug 06 '20 00:08 AlexeyAB

I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.

The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw) of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.

I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.

maudzung avatar Aug 07 '20 01:08 maudzung

I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.

Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? Do you use [email protected] or [email protected]? What pre-trained weights do you use for training?

The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw) of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.

I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.

Great! Is YOLO3D better than Complex-YOLOv3 in terms of accuracy, or only 7-DOF vs 5-DOF?

Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?

  • Voxelization
  • 3d convolution ndchw
  • Conv2d
  • Nms: Maxpool zero_nonmax=1
  • Sin, cos - activations for angles
  • Training-only corner regression (closely related task)

AlexeyAB avatar Aug 07 '20 01:08 AlexeyAB

Hi @AlexeyAB

Do you use [email protected] or [email protected]?

I evaluated with [email protected]. I'll use [email protected]>0.95 to evaluate the models.

Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? What pre-trained weights do you use for training?

In both models, I didn't use transfer learning. I trained the models from scratch. That's why I plan to train networks on a bigger dataset.

Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?

Thank you so much for your suggestion. I'll read the paper.

maudzung avatar Aug 07 '20 02:08 maudzung

Hi @AlexeyAB

I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.

Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.

Thank you once again for your great paper, your answers, and your suggestion. I have learned a lot from your YOLOv4 paper 💯

maudzung avatar Aug 24 '20 16:08 maudzung

@maudzung Hi,

I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.

Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.

Great!

Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?

Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.

AlexeyAB avatar Aug 24 '20 19:08 AlexeyAB

Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?

Yes. Although I used a spconv lib to implement the Voxelization step and build model, the speed was very slow, around 7FPS for the forward pass only.

Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.

I tested FairMOT implementation, it's also great, but I didn't try to jointly detect and track objects. Thanks for the suggestions, I'll investigate it.

maudzung avatar Aug 25 '20 01:08 maudzung