evaluate: explain/document metrics
If I understand correctly the idea behind these metrics are taken from "rethinking semantic segmentation evaluation" paper, but could you explain to me how could I obtain AP,TPs,FPs,FNs for instance segmentation task?
Originally posted by @andreaceruti in https://github.com/cocodataset/cocoapi/issues/564#issuecomment-1064223428
Yes, that paper lent the idea for the oversegmentation and undersegmentation measures – but only these two (not the others), and I took the liberty to deviate from the exact definition of Zhang et al. 2021:
https://github.com/OCR-D/ocrd_segment/blob/81923495648c346a84436fb7d08727d9c13eb88d/ocrd_segment/evaluate.py#L440-L444
So in my implementation these measures are merely raw ratios, i.e. the share of regions in GT and DT which have been oversegmented (or undersegmented, resp.).
My notion of a match is somewhat arbitrary, but IMO more adequate than averaging over different IoU thresholds for various confidence thresholds:
- A pair of true vs predicted region is a true positive (TP), iff
- its IoU is ≥ 50% or
- its IoGT is ≥ 50% or
- its IoDT is ≥ 50%.
- A prediction which is not matched is a false positive (FP).
- A ground truth which is not matched is a false negative (FN).
(All area values under consideration are numbers of pixels in the polygon-masked segments, not just bounding box sizes.)
So in all, you get the following metrics here:
- area measures
IoU: intersection over union, i.e. the share of the overlapping area of a match over the union of the true and the predicted regionIoGT: intersection over ground truth, i.e. the share of the overlapping area of a match over the total area of the true regionIoDT: intersection over detection, i.e. the share of the overlapping area of a match over the total area of the predicted regionpixel-recall: page-wise aggregate of intersection over GT including missed true regions (FN), i.e. the share of the overlapping areas over the total area of true regions in a pagepixel-precision: page-wise aggregate of intersection over DT including fake predicted regions (FP), i.e. the share of the overlapping areas over the total area of predicted regions in a page
- segment measures
oversegmentation: share of true and predicted regions which have been oversegmented (i.e. where true regions match multiple detections) over all regionsundersegmentation: share of true and predicted regions which have been undersegmented (i.e. where predicted regions match multiple ground truths) over all regionsrecall: ratio of matches (TP) over true regions, i.e. share of correctly predicted regions in total GTprecision: ratio of matches (TP) over detected regions, i.e. share of correctly predicted regions in total DT
For each metric, there is a page-wise (or even segment-wise) and an aggregated measure; the latter always uses micro-averaging over all (matching pairs in all) pages.