Machine-Learning-Collection icon indicating copy to clipboard operation
Machine-Learning-Collection copied to clipboard

YOLO ground truth width and length are not relative to image size but to S

Open oonisim opened this issue 2 years ago • 0 comments

Code

dataset.py calculate thewidth_cell and height_cell to be set to the label_matrix Tensor.

"""
...
Then to find the width relative to the cell is simply:
width_pixels/cell_pixels, simplification leads to the
formulas below.
"""
width_cell, height_cell = (
    width * self.S,
    height * self.S,
)

Question

Please help understand why the unit of width_cell and width_cell are cells, that is, relative to S instead of image size.

In my understanding, width andheight are from the YOLO Darknet annotation where width and height are relative to the image size whose value is between 0 and 1. Suppose width=0.7, then width_cell will be 4.9 cells.

If width_cell and width_cell are used as the ground truth for YOLO v1 training, I suppose they should be relative to image size as in the YOLO v1 paper.

Each bounding box consists of 5 predictions: x, y, w, h, and confidence. The (x; y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image.

oonisim avatar Feb 26 '23 07:02 oonisim