amazon-textract-textractor
amazon-textract-textractor copied to clipboard
JPEG conversion in `analyze_document` significantly impacts table predictions
When obtaining predictions through analyze_document, the image is converted to JPEG https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/textractor.py#L845. The compression is enough to degrade the table predictions.
We should check and keep the format, assuming that it is supported by Textract to avoid discrepancies between calling Textract with Textractor and calling Textract with boto3.