amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

JPEG conversion in `analyze_document` significantly impacts table predictions

Open Belval opened this issue 1 year ago • 1 comments

When obtaining predictions through analyze_document, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions.

We should check and keep the format, assuming that it is supported by Textract to avoid discrepancies between calling Textract with Textractor and calling Textract with boto3.

Belval avatar Mar 21 '24 22:03 Belval