Beginner Question about Evaluation Metrics
I have a very beginner question about what the eval metrics actually mean. I can't quite seem to find anything in the docs (If I missed it, I'm sorry! I looked for quite awhile before asking, I promise).
Currently, I am gathering these metrics on validation and test: [aAcc, mAcc, mIoU, mDice, mFscore, mPrecision, mRecall]. I can use Google to figure out what these mean individually per image vs. ground truth... But what do they mean in the context of the val/test eval as a whole?
If we take aAcc as an example, calculated here. Does it take the sum of all class intersections with the gt, divided by the sum of the area of the gt?
I think my comprehension stops here. I have googled and tried to understand these few lines, but for some reason something is not clicking:
intersect = pred_label[pred_label == label] # I know this one!
area_intersect = torch.histc(
intersect.float(), bins=(num_classes), min=0,
max=num_classes - 1).cpu()
area_pred_label = torch.histc(
pred_label.float(), bins=(num_classes), min=0,
max=num_classes - 1).cpu()
area_label = torch.histc(
label.float(), bins=(num_classes), min=0,
max=num_classes - 1).cpu()
area_union = area_pred_label + area_label - area_intersect # This one is easy too!
Final Question!: To get the mIoU from the individual IoUs, this would just be averaging the IoU from every image in the test / val set? Do we take the average per class first vs. the gt, then average the samples?
I appreciate your time and help, thank you!
Taking mAcc and aAcc as examples, when the sample size is extremely unbalanced, there is a significant difference between mAcc and aAcc: mAcc simply calculates the mean Acc for each class, while aAcc treats all classes as one and calculates the Acc.
Thank you for your kind reply! With this help, I was able to google more and get a better understanding.