Computation of IoU
When trying to replicate your findings I noticed an odd detail about the calculation of the mIoU. It concerns the formula (2) in the paper. The outer sum, as claimed in the text, iterates over i=2..n in order to compute the average IoU over all classes except background (which corresponds to i=1). My issue is with the inner sum, also iterating over j=2..n. This does not seem correct to me as it effectively disregards all pixels that are marked as background by either the ground truth or the model. In the extreme case, this would mean that a class, where only one pixel of ground truth and model output overlaps, is awarded an IoU of 1.0 if all the remaining pixels are background.
To measure the error incurred, I used the model weights saved in the repository to redo the mIoU calculation, and obtained a value of 0.290 instead of 0.649, as claimed in the paper.
Is there a specific reason for this way of calculating the IoU and/or a reference you could point me towards?
You can refer to this repo for the metric code.
https://github.com/yuxiangsun/RTFNet/blob/master/util/util.py