amazon-textract-textractor Add Page to DocumentEntity

Add Page to DocumentEntity

Open Belval opened this issue 2 years ago • 0 comments

Issue #, if available: #170

Description of changes: The original issue was that word and line bounding boxes were shifted in some cases when page width or page height != 1 (100%) because the visualizer uses the page width/height as relative coordinates for document entities such as Word/Line. This is problematic because the Textract API actually returns bounding boxes relative to the image size, not the page size.

This PR fixes the base issue but also reworks the DocumentEntity object to give it a .page and .page_id properties to remove the code duplication in all DocumentEntities.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Mar 09 '23 16:03 Belval

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Add Page to DocumentEntity

amazon-textract-textractor
amazon-textract-textractor copied to clipboard