depthai
depthai copied to clipboard
[BUG] YOLOv4 Postprocessing Issues
Describe the bug Yolov4 outputs noisy and innaccurate bounding boxes when going through the dai.YoloDetectionNetwork node
Minimal Reproducible Example The base app in https://docs.luxonis.com/software/depthai/examples/tiny_yolo/, using yolo4 specifically yolo-v4-tiny-tf
Expected behavior Accurate bounding boxes around the detected object
Screenshots
Another user also experienced similar issues on the forum a while back. https://discuss.luxonis.com/d/744-crazy-yolov4-tiny-detections-from-depthai-python-examples
While they found a "fix" for the problem, the system still suffers from a inaccurate bounding box.
Additional context TLDR: The openvino v4 model, requires an additional sigmoid call on the x,y center output
After looking through lots of documentation about the models I stumbled across the openvino specification for the models https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tiny-tf and https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tiny-tf
The only difference I could find is that the confidence and x,y centers require sigmoid to get accurate values.
This is also the underlying reason that you can give confidence values higher than one to the v4 model, and it will still give outputs
CONFIDENCE_THRESHOLD = 0.9
actual_threshold = inverse_sigmoid(CONFIDENCE_THRESHOLD) if YOLO_VERSION == "4" else CONFIDENCE_THRESHOLD
This lead me to recreating the yolo postprocessing pipeline to add the sigmoid function in the correct place https://github.com/Tianxiaomo/pytorch-YOLOv4/tree/master after recreating this postprocessing system in np rather than torch
I now got a functional yolo v4 model.
bxy = sigmoid(bxy) * scale_x_y - 0.5 * (scale_x_y - 1)
Commenting and un-commenting this line toggles between the noisy outputs shown above and the expected outputs with smooth precise bounding boxes.
I tried to find where you do the postprocessing for these models in your github repos, but I couldn't find anything leading me to believe the logic for this postprocessing sits in some kind of proprietary codebase, otherwise I would have created a PR.
I am relatively new to open source/ issues So please let me know if I can give you more info? I can upload my modified yolo demo code if that would be of a help as well