darknet
darknet copied to clipboard
Necessary steps for improving missed detection
Hello all,
I'm working on a problem to detect rows and columns using yolov4. I'm facing some issues in which some random rows and columns stay undetected.
Below are some samples images to show the problem I'm facing:
In the first, second and third images there is one row missed to detect and in the fourth image two columns are not detected I have reached a precision of 96 and a recall of 97. I want to both reach values of 99 or above. Is there anything that I can do in yolov4 that could help me with this? I also tried increasing the dataset but still, I'm not able to reach there.
cfg file that I'm using is: yolov4-obj_class_tagging_2.txt
triaininig command that I'm using: ./darknet detector train data/obj.data cfg/yolov4-obj_class_tagging_2.cfg yolov4.conv.137 -dont_show -map
Thanks in advance...:)
From looking through your config setup, it seems you're using the basic anchors that are coming with standard yolo.
These anchors are used to determine the basic size of bounding box that yolo starts with in its detections. By changing it to a value that is more aligned with your data, you can increase the IoU precision of your model. The quality of improvement depends on the overall size of the change, but calculating them can't hurt.
You can calculate the anchors for a dataset using the following command: darknet.exe detector calc_anchors $datapath -num_of_clusters $num(usually 9) -Width $modelWidth -Height $ModelHeight
This will make an 'anchors.txt' file. You can copy/paste these numbers behind the 'anchors=' line in the [yolo] layers in your .cfg.
Another thing that might help you with the detection of rows (though I'm less sure about this one) is simply using a Hough transform (https://en.wikipedia.org/wiki/Hough_transform). While useful in multiple computer vision scenario's, the reason I think this might help you explicitely is that it's original use was detection of lines in an image (usually for the recognition of text). So you can use the transform to gauge whether or not your model is correct, or, much more interestingly, you can transform your dataset and use the transformed images to train the detection of the rows and columns.
By reducing the data of the image to a few features, you'll likely gain those last couple of percentage points. You might find that a Hough transform, or changing your anchors aren't the perfect solution. In that case, just remember that there are a ton of transforms and image manipulations out there, you just need to find the right one for your dataset.