camtrap-dp icon indicating copy to clipboard operation
camtrap-dp copied to clipboard

Add bounding box

Open peterdesmet opened this issue 3 years ago • 1 comments

See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.

peterdesmet avatar Aug 19 '22 14:08 peterdesmet

I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.

ddachs avatar Sep 28 '22 09:09 ddachs

Discussed with @kbubnicki

  • Name term boundingBox (most recognizable)
  • Insert right after mediaID (to zoom in further)
  • Only use it for media-observations table (not event-observations)
  • Definition to be provided
  • Recommended format to be provided

peterdesmet avatar Jan 26 '23 13:01 peterdesmet

I recommend the YOLO format to be used. This way the coordinates will be independent of the image size (which can vary)

ddachs avatar Jan 26 '23 13:01 ddachs

@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]) by having it as x, y, 0, 0?

peterdesmet avatar Feb 06 '23 09:02 peterdesmet

@danstowell in reply to https://github.com/tdwg/camtrap-dp/pull/314#issuecomment-1561610637, if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:

observationID mediaID scientificName start end boundingBox
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x1, y1, width1, height1]
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x2, y2, width2, height2]
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x3, y3, width3, height3]

peterdesmet avatar May 24 '23 17:05 peterdesmet

Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:

observationID mediaID scientificName start end bboxX bboxY bboxWidth bboxHeight
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x1 y1 width1 height1
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x2 y2 width2 height2
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x3 y3 width3 height3

@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?

kbubnicki avatar May 25 '23 07:05 kbubnicki

The format would be:

[
    {
        "name": "bboxX",
        "description": "The relative X coordinate of a bounding box center, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxY",
        "description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxWidth",
        "description": "The relative width of a bounding box, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxHeight",
        "description": "The relative height of a bounding box, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    }
]

It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX and bboxY columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth and bboxHeight are simply zeros.

kbubnicki avatar May 25 '23 10:05 kbubnicki

I like that approach.

peterdesmet avatar May 25 '23 11:05 peterdesmet

Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.

Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)

danstowell avatar May 25 '23 11:05 danstowell

Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?

bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac

@danstowell @kbubnicki Or would you advise against that?

Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).

peterdesmet avatar May 25 '23 16:05 peterdesmet

@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes are.

For example, our bboxWidth follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:

The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.

But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:

Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Is our bboxWidth than still an exact match or is it broader (because we allow more)?

peterdesmet avatar May 26 '23 09:05 peterdesmet

Update based on #323

  • We have now adopted top-left corner rather than center. It aligns with Megadetector format and AC
  • We don't allow 0 values anymore
  • AC terms are broader than Camtrap DP terms, because the bounding boxes should encompass observed individuals, not just any object.

peterdesmet avatar May 26 '23 12:05 peterdesmet

@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.

baskaufs avatar May 26 '23 20:05 baskaufs