amazon-textract-textractor Need an option to save output in UTF-8 encoding to avoid saving as Windows-1252 encoding

Need an option to save output in UTF-8 encoding to avoid saving as Windows-1252 encoding

Open lihuib opened this issue 2 years ago • 1 comments

It looks like the only way to capture the output of amazon-textract is to redirect it into a file. Such as:

amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES > 2022-04-16-0010.txt

Unfortunately, this is a problem on Windows because the default encoding is Windows 1252, not UTF-8. When trying to analyze the output using other tools, UTF-8 is often required.

Something like this would be very useful:

amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES -output-document 2022-04-16-0010.txt

where the default output is UTF-8.

Apr 18 '22 18:04 lihuib

Makes sense. Thx! We'll add a separate output option.

Jul 12 '22 02:07 schadem

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Need an option to save output in UTF-8 encoding to avoid saving as Windows-1252 encoding

amazon-textract-textractor
amazon-textract-textractor copied to clipboard