depthai icon indicating copy to clipboard operation
depthai copied to clipboard

[BUG] YOLOv4 Postprocessing Issues

Open Lachlan-Alsop-spark opened this issue 1 year ago • 0 comments

Describe the bug Yolov4 outputs noisy and innaccurate bounding boxes when going through the dai.YoloDetectionNetwork node

Minimal Reproducible Example The base app in https://docs.luxonis.com/software/depthai/examples/tiny_yolo/, using yolo4 specifically yolo-v4-tiny-tf

Expected behavior Accurate bounding boxes around the detected object

Screenshots image

Another user also experienced similar issues on the forum a while back. https://discuss.luxonis.com/d/744-crazy-yolov4-tiny-detections-from-depthai-python-examples

While they found a "fix" for the problem, the system still suffers from a inaccurate bounding box. image

Additional context TLDR: The openvino v4 model, requires an additional sigmoid call on the x,y center output

After looking through lots of documentation about the models I stumbled across the openvino specification for the models https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tiny-tf and https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tiny-tf

The only difference I could find is that the confidence and x,y centers require sigmoid to get accurate values. This is also the underlying reason that you can give confidence values higher than one to the v4 model, and it will still give outputs CONFIDENCE_THRESHOLD = 0.9 actual_threshold = inverse_sigmoid(CONFIDENCE_THRESHOLD) if YOLO_VERSION == "4" else CONFIDENCE_THRESHOLD

This lead me to recreating the yolo postprocessing pipeline to add the sigmoid function in the correct place https://github.com/Tianxiaomo/pytorch-YOLOv4/tree/master after recreating this postprocessing system in np rather than torch

I now got a functional yolo v4 model. bxy = sigmoid(bxy) * scale_x_y - 0.5 * (scale_x_y - 1) Commenting and un-commenting this line toggles between the noisy outputs shown above and the expected outputs with smooth precise bounding boxes.

I tried to find where you do the postprocessing for these models in your github repos, but I couldn't find anything leading me to believe the logic for this postprocessing sits in some kind of proprietary codebase, otherwise I would have created a PR.

I am relatively new to open source/ issues So please let me know if I can give you more info? I can upload my modified yolo demo code if that would be of a help as well

Lachlan-Alsop-spark avatar Oct 21 '24 00:10 Lachlan-Alsop-spark