amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

[Doc] BoundingBox coordinate unit and scale are unclear

Open oonisim opened this issue 1 year ago • 2 comments

classtextractor.entities.bbox.BoundingBox(x: float, y: float, width: float, height: float, spatial_object=None) says:

Represents the bounding box of an object in the format of a dataclass with (x, y, width, height). By default BoundingBox is set to work with denormalized co-ordinates: x: (0, docwidth), y: (0, docheight). Use the as_normalized_dict function to obtain BoundingBox with normalized co-ordinates: x: (0, 1), y: (0, 1)

image

Problem

Definition of docwidth and docheight are not clear.

Clarification

Does pages in the Document objects by default use x:(0, 1) and y:(0, 1) ir x(0: width_in_pixel) and y:(0, height_in_pixels) with regard to (docwidth, docheight) in By default BoundingBox is set to work with denormalized co-ordinates: x: (0, docwidth), y: (0, docheight).?

With the code below, it appears it is using (0,1) but not sure where it is clearly documented and guaranteed to be so in the future.

document = extractor.analyze_document(
    file_source=str(FILEPATH),
    features=[
        TextractFeatures.LAYOUT, 
        TextractFeatures.FORMS, 
        TextractFeatures.TABLES
    ],
    save_image=True,  # To use images property and visualize of the document instance.
)

bbox = document.pages[0].words[0].bbox
print(bbox)
-----
x: 0.40578076243400574, y: 0.14519663155078888, width: 0.08256930857896805, height: 0.009907064028084278

If using the docheight is using the pixel, it should be in between (0, 2339), but apparently it is not using it.

print(f"page height:{document.pages[0].height}, document page 0 image height:{document.images[0].height}")
-----
page height:1.0, document page 0 image height:2339

AWS Textract Document

AWS documentation of BoundingBox is clear that the unit/scale is ratio of page width/height.

  • Height – The height of the bounding box as a ratio of the overall document page height.
  • Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.
  • Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.
  • Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).

image

oonisim avatar Mar 01 '24 08:03 oonisim