ocrd_segment icon indicating copy to clipboard operation
ocrd_segment copied to clipboard

evaluate: explain/document metrics

Open bertsky opened this issue 3 years ago • 1 comments

If I understand correctly the idea behind these metrics are taken from "rethinking semantic segmentation evaluation" paper, but could you explain to me how could I obtain AP,TPs,FPs,FNs for instance segmentation task?

Originally posted by @andreaceruti in https://github.com/cocodataset/cocoapi/issues/564#issuecomment-1064223428

bertsky avatar Mar 10 '22 16:03 bertsky

Yes, that paper lent the idea for the oversegmentation and undersegmentation measures – but only these two (not the others), and I took the liberty to deviate from the exact definition of Zhang et al. 2021: https://github.com/OCR-D/ocrd_segment/blob/81923495648c346a84436fb7d08727d9c13eb88d/ocrd_segment/evaluate.py#L440-L444

So in my implementation these measures are merely raw ratios, i.e. the share of regions in GT and DT which have been oversegmented (or undersegmented, resp.).

My notion of a match is somewhat arbitrary, but IMO more adequate than averaging over different IoU thresholds for various confidence thresholds:

  • A pair of true vs predicted region is a true positive (TP), iff
    • its IoU is ≥ 50% or
    • its IoGT is ≥ 50% or
    • its IoDT is ≥ 50%.
  • A prediction which is not matched is a false positive (FP).
  • A ground truth which is not matched is a false negative (FN).

(All area values under consideration are numbers of pixels in the polygon-masked segments, not just bounding box sizes.)

So in all, you get the following metrics here:

  • area measures
    • IoU: intersection over union, i.e. the share of the overlapping area of a match over the union of the true and the predicted region
    • IoGT: intersection over ground truth, i.e. the share of the overlapping area of a match over the total area of the true region
    • IoDT: intersection over detection, i.e. the share of the overlapping area of a match over the total area of the predicted region
    • pixel-recall: page-wise aggregate of intersection over GT including missed true regions (FN), i.e. the share of the overlapping areas over the total area of true regions in a page
    • pixel-precision: page-wise aggregate of intersection over DT including fake predicted regions (FP), i.e. the share of the overlapping areas over the total area of predicted regions in a page
  • segment measures
    • oversegmentation: share of true and predicted regions which have been oversegmented (i.e. where true regions match multiple detections) over all regions
    • undersegmentation: share of true and predicted regions which have been undersegmented (i.e. where predicted regions match multiple ground truths) over all regions
    • recall: ratio of matches (TP) over true regions, i.e. share of correctly predicted regions in total GT
    • precision: ratio of matches (TP) over detected regions, i.e. share of correctly predicted regions in total DT

For each metric, there is a page-wise (or even segment-wise) and an aggregated measure; the latter always uses micro-averaging over all (matching pairs in all) pages.

bertsky avatar Mar 10 '22 18:03 bertsky