amazon-textract-textractor JPEG conversion in `analyze_document` significantly impacts table predictions

JPEG conversion in `analyze_document` significantly impacts table predictions

Open Belval opened this issue 1 year ago • 1 comments

When obtaining predictions through analyze_document, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions.

We should check and keep the format, assuming that it is supported by Textract to avoid discrepancies between calling Textract with Textractor and calling Textract with boto3.

Mar 21 '24 22:03 Belval

amazon-textract-textractor amazon-textract-textractor copied to clipboard

JPEG conversion in `analyze_document` significantly impacts table predictions

amazon-textract-textractor
amazon-textract-textractor copied to clipboard