amazon-textract-textractor
amazon-textract-textractor copied to clipboard
[Doc] BoundingBox coordinate unit and scale are unclear
Represents the bounding box of an object in the format of a dataclass with (x, y, width, height). By default BoundingBox is set to work with denormalized co-ordinates: x: (0, docwidth), y: (0, docheight). Use the as_normalized_dict function to obtain BoundingBox with normalized co-ordinates: x: (0, 1), y: (0, 1)
Problem
Definition of docwidth and docheight are not clear.
Clarification
Does pages in the Document objects by default use x:(0, 1) and y:(0, 1) ir x(0: width_in_pixel) and y:(0, height_in_pixels) with regard to (docwidth, docheight) in By default BoundingBox is set to work with denormalized co-ordinates: x: (0, docwidth), y: (0, docheight).?
With the code below, it appears it is using (0,1) but not sure where it is clearly documented and guaranteed to be so in the future.
document = extractor.analyze_document(
file_source=str(FILEPATH),
features=[
TextractFeatures.LAYOUT,
TextractFeatures.FORMS,
TextractFeatures.TABLES
],
save_image=True, # To use images property and visualize of the document instance.
)
bbox = document.pages[0].words[0].bbox
print(bbox)
-----
x: 0.40578076243400574, y: 0.14519663155078888, width: 0.08256930857896805, height: 0.009907064028084278
If using the docheight is using the pixel, it should be in between (0, 2339), but apparently it is not using it.
print(f"page height:{document.pages[0].height}, document page 0 image height:{document.images[0].height}")
-----
page height:1.0, document page 0 image height:2339
AWS Textract Document
AWS documentation of BoundingBox is clear that the unit/scale is ratio of page width/height.
- Height – The height of the bounding box as a ratio of the overall document page height.
- Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.
- Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.
- Width – The width of the bounding box as a ratio of the overall document page width.
Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).