amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Need an option to save output in UTF-8 encoding to avoid saving as Windows-1252 encoding
It looks like the only way to capture the output of amazon-textract is to redirect it into a file. Such as:
amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES > 2022-04-16-0010.txt
Unfortunately, this is a problem on Windows because the default encoding is Windows 1252, not UTF-8. When trying to analyze the output using other tools, UTF-8 is often required.
Something like this would be very useful:
amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES -output-document 2022-04-16-0010.txt
where the default output is UTF-8.
Makes sense. Thx! We'll add a separate output option.