PyTorch-YOLOv3 icon indicating copy to clipboard operation
PyTorch-YOLOv3 copied to clipboard

How does the costume bbox/ annotation format look like? (Pixel coordinates or aspect ratio?)

Open mylife126 opened this issue 4 years ago • 3 comments

Hi, Thanks for sharing this great repo. I am trying to create my own dataset with my own annotation. However, I do not know if the bbox coordinates have to be the aspect ratio or the real pixel coordinate of that object? As I read the paper, it looks like the bbox info contains the following information: {bx,by,bh,bw}, and the bx, by are bounded within 0~1, which represents the ratio of where the center is with respect to the assigned grid.

Could you please let me know if I should annotate my bbox information according to this aspect ratio fashion or just the real- pixel coordinate of it?

Thanks!!

mylife126 avatar Jul 16 '19 00:07 mylife126

if you have the real-pixel coordinates, just divide the x_center and width by the width of your image and the y_center and height by the height of your images to normalize.

hydeta avatar Jul 16 '19 17:07 hydeta

if you have the real-pixel coordinates, just divide the x_center and width by the width of your image and the y_center and height by the height of your images to normalize.

Thanks for asking. I just went through the codes, and I now I have one other question. In the Dataset, why do we need to reshape our label into the shape of [50, 5] (suppose my original label is just [45 0.479492 0.688771 0.955609 0.595500 ]).

And then, I am having a hard time to understand how could this [50,5] label match up with the network's output? From the paper, the output could be 13 * 13 * 85, which means that we have 13 grids and 85 predictions.

Thanks in advance!

mylife126 avatar Jul 16 '19 20:07 mylife126

Is this issue still relevant?

Flova avatar Feb 02 '21 14:02 Flova