yolo_v1_pytorch
yolo_v1_pytorch copied to clipboard
Is there a problem in the coordinates normalization before computing IoU ?
In voc.py
module (l. 120) you normalize coordinates such as :
xy, wh, label = boxes_xy[b], boxes_wh[b], int(labels[b])
ij = (xy / cell_size).ceil() - 1.0
i, j = int(ij[0]), int(ij[1]) # y & x index which represents its location on the grid.
x0y0 = ij * cell_size # x & y of the cell left-top corner.
xy_normalized = (xy - x0y0) / cell_size # x & y of the box on the cell, normalized from 0.0 to 1.0.
Predicted coordinates will converge to a such normalize form. So, why do you rescale your pred_xyxy
in that way in the loss.py
module (l. 114) ?
pred_xyxy = Variable(torch.FloatTensor(pred.size())) # [B, 5=len([x1, y1, x2, y2, conf])]
# Because (center_x,center_y)=pred[:, 2] and (w,h)=pred[:,2:4] are normalized for cell-size and image-size respectively,
# rescale (center_x,center_y) for the image-size to compute IoU correctly.
pred_xyxy[:, :2] = pred[:, :2]/float(S) - 0.5 * pred[:, 2:4]
pred_xyxy[:, 2:4] = pred[:, :2]/float(S) + 0.5 * pred[:, 2:4]
If I understand correctly this block, you normalized by S
your grid number (here, 7 I suppose) a value which is already normalized in a certain form (normalized by the grid cell here). Can't that be an issue ?
I totally agree with @E-delweiss observation. I am not sure to understand why we have to divide by S=7 again. I did compute the iou with these rescaling methods and I had good results, but I still do not understand why we rescale in that way specifically.
@E-delweiss @valentin-fngr
After reading the code, I think the implementation is already correct.
Since in voc.py
, the xy_normalized is normalized by dividing it by cell_size, with cell_size = 1/S. 1 here is the normalized width or height of the image size.
xy_normalized = (xy - x0y0) / cell_size # x & y of the box on the cell, normalized from 0.0 to 1.0.
xy_normalized = (xy - x0y0) * S
So, in order to de-normalize the center_x and center_y to normalized image scale, we need to multiply back with cell_size, which is 1/S.
xy = xy_normalized * cell_size
xy = xy_normalized / S
Cmiiw.