camtrap-dp Add bounding box

See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.

Aug 19 '22 14:08 peterdesmet

I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.

Sep 28 '22 09:09 ddachs

Discussed with @kbubnicki

Name term boundingBox (most recognizable)
Insert right after mediaID (to zoom in further)
Only use it for media-observations table (not event-observations)
Definition to be provided
Recommended format to be provided

Jan 26 '23 13:01 peterdesmet

I recommend the YOLO format to be used. This way the coordinates will be independent of the image size (which can vary)

Jan 26 '23 13:01 ddachs

@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]) by having it as x, y, 0, 0?

Feb 06 '23 09:02 peterdesmet

@danstowell in reply to https://github.com/tdwg/camtrap-dp/pull/314#issuecomment-1561610637, if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:

observationID	mediaID	scientificName	start	end	boundingBox
obs1	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x1, y1, width1, height1]`
obs2	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x2, y2, width2, height2]`
obs3	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	`[x3, y3, width3, height3]`

May 24 '23 17:05 peterdesmet

Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:

observationID	mediaID	scientificName	start	end	bboxX	bboxY	bboxWidth	bboxHeight
obs1	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x1	y1	width1	height1
obs2	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x2	y2	width2	height2
obs3	med1	Passer domesticus	2020-08-02T05:00:15Z	2020-08-02T05:00:15Z	x3	y3	width3	height3

@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?

May 25 '23 07:05 kbubnicki

The format would be:

[
    {
        "name": "bboxX",
        "description": "The relative X coordinate of a bounding box center, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxY",
        "description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxWidth",
        "description": "The relative width of a bounding box, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxHeight",
        "description": "The relative height of a bounding box, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    }
]

It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX and bboxY columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth and bboxHeight are simply zeros.

May 25 '23 10:05 kbubnicki

I like that approach.

May 25 '23 11:05 peterdesmet

Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.

Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)

May 25 '23 11:05 danstowell

Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?

bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac

@danstowell @kbubnicki Or would you advise against that?

Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).

May 25 '23 16:05 peterdesmet

@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes are.

For example, our bboxWidth follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:

The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.

But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:

Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Is our bboxWidth than still an exact match or is it broader (because we allow more)?

May 26 '23 09:05 peterdesmet

Update based on #323

We have now adopted top-left corner rather than center. It aligns with Megadetector format and AC
We don't allow 0 values anymore
AC terms are broader than Camtrap DP terms, because the bounding boxes should encompass observed individuals, not just any object.

May 26 '23 12:05 peterdesmet

@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.

May 26 '23 20:05 baskaufs

camtrap-dp camtrap-dp copied to clipboard

Add bounding box

camtrap-dp
camtrap-dp copied to clipboard