deep_sort icon indicating copy to clipboard operation
deep_sort copied to clipboard

Fix bbox bottom right corner calculation

Open st235 opened this issue 1 year ago • 5 comments

The boxes are given in (top left x, top left y, width, height) format. In order to get bottom right corner we need to add width and height to x and y correspondingly and subtract 1.

On the line 62 in tools/generate_detections.py you treat image rect correctly by subtracting 1 from the image box, which is originally (0, 0, w, h) and after subtracting (0, 0, w - 1, h - 1).

bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]) - 1, bbox[2:])

however, you don't do it in other places across the repo, for example, on line 57 in the same file

# convert to top left, bottom right
bbox[2:] += bbox[:2]

By not doing so, you get a pixel that is diagonally to the bottom right corner. This minor inconsistency is really confusing.

There are 2 ways of how to resolve it:

  1. Fix rect calculation across the codebase (subtract 1 from other rects when converted to (left, top, right, bottom))
  2. Make image rect consistent to other rects and do not subtract 1 for the image right bottom corner

I believe the first approach is more reasonable to take as it also fixes how iou is calculated. Below is an example of how one can figure out the right bottom corner of a rect:

Screenshot 2023-06-30 at 11 24 44

This is not a critical issue and should not dramatically affect metrics though would be really nice to fix.

st235 avatar Jun 30 '23 10:06 st235

@nwojke can you, please, take a look?

st235 avatar Jun 30 '23 10:06 st235

@nwojke any luck you can take a look?

st235 avatar Jul 12 '23 22:07 st235

@nwojke can you, please, take a look?

st235 avatar Aug 07 '23 21:08 st235

As far as I know W = x2 - x1 and H = y2 -y1.

According to that x2(right) = x1(Let) + W not "x2(right) = x1(Let) + W -1".

Maybe you are changing your bboxes in other parts of your implementation.

SajjadPSavoji avatar Nov 14 '23 20:11 SajjadPSavoji

Hello @SajjadPSavoji

Thank you for the provided feedback. I really appreciate it.

As far as I know W = x2 - x1 and H = y2 -y1. According to that x2(right) = x1(Let) + W not "x2(right) = x1(Let) + W -1". Maybe you are changing your bboxes in other parts of your implementation.

Check out the places I was highlighting in the original comment. I agree, the problem seems a bit tricky.

There are a few inconsistencies around this place.

The original comment on the line 32 says

    bbox : array_like
        The bounding box in format (x, y, width, height).

I.e. if you have a one pixel box, let's say (0, 0, 1, 1), then it seems it is occupying the only pixel at the position 0,0.

Both ways of handling this box would be correct: either consider the right and bottom corners of it in inclusive manner or exclusive one.

Though, the lines 57 and 62 seem inconsistent to each other.

The viewport on line 62 is considered originally as (0, 0, w, h). Right?

    # clip at image boundaries
    bbox[:2] = np.maximum(0, bbox[:2])
    bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]) - 1, bbox[2:])

And the box is clipped by (0, 0, w - 1, h - 1).

At line 57 we don't do the same thing.

# convert to top left, bottom right
bbox[2:] += bbox[:2]
bbox = bbox.astype(np.int)

Imagine we have a bounding box (0, 0, viewport_width, viewport_height). Consider those are the width and height of the bounding box (from the pydoc) we trim them at line 62, and the box becomes (0, 0, viewport_width - 1, viewport_height - 1). At the same time, there is nothing wrong with the box having the same width and height as the viewport as this box will be the box with the size of the whole image and it seems it should not be clipped in that case.

As I said before, the inconsistency seems rather minor. However, to fix the issue, we can do a different change, for example, instead of changing boxes handling we can change the viewport handling in here.

Something similar to this

    bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]), bbox[2:])

Hopefully, the explanation seems reasonable. What do you think?

P.S.: regardless of the way to fix the issue, I think without receiving the initial maintainers feedback there is nothing we can do about it.

st235 avatar Nov 14 '23 21:11 st235