a-PyTorch-Tutorial-to-Object-Detection icon indicating copy to clipboard operation
a-PyTorch-Tutorial-to-Object-Detection copied to clipboard

Predict box shape directly instead of offsets?

Open stevebottos opened this issue 3 years ago • 0 comments

More of a question than an issue really. I was curious - if I'm understanding correctly the network will predict offsets for each anchor box, which in turn will describe a bounding box. This requires lots of conversions (cxcy to xy, encoding, decoding), so would it not be possible to simply train the network to output as [xmin, ymin, xmax, ymax] instead of [offset-x, offset-y, width, height]? If not, what are the issues with this?

In the same vein, is the encoding and decoding of the bounding box only necessary because we need to go from offsets -> bounding box described by offsets?

stevebottos avatar Jul 24 '21 01:07 stevebottos