SparseR-CNN
SparseR-CNN copied to clipboard
About the loss format
Hi, thanks for your great work!I have a problem that I see the format of the box is (cx, cy, w, h) in the paper, when computing loss. But I find the difference in your code which is (x, y, x, y)? eg:
# The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
assert 'pred_boxes' in outputs
idx = self._get_src_permutation_idx(indices)
src_boxes = outputs['pred_boxes'][idx]
target_boxes = torch.cat([t['boxes_xyxy'][i] for t, (_, i) in zip(targets, indices)], dim=0)
losses = {}
loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(src_boxes, target_boxes))
losses['loss_giou'] = loss_giou.sum() / num_boxes
image_size = torch.cat([v["image_size_xyxy_tgt"] for v in targets])
src_boxes_ = src_boxes / image_size
target_boxes_ = target_boxes / image_size
loss_bbox = F.l1_loss(src_boxes_, target_boxes_, reduction='none')
losses['loss_bbox'] = loss_bbox.sum() / num_boxes
return losses
Hi~ Actually (cx, cy, w, h) and (x1, y1, x2, y2) can be converted by box_ops.box_cxcywh_to_xyxy and box_ops.box_xyxy_to_cxcywh.
y (cx, cy, w, h) and (x1, y1, x2, y2) can b
Em~thank you, I mean I should use the format (x1, y1, x2, y2) to compute l1 loss as your code instead of (cx, cy, w, h)?
yep~
yep~
thanks