amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Analyze documents with Amazon Textract and generate output in multiple formats.
The page number is overwritten if you pass it to the function within the for loop. Plus the page number is not considered as search criteria. [Source Code Snipped from...
It looks like the only way to capture the output of amazon-textract is to redirect it into a file. Such as: amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES > 2022-04-16-0010.txt Unfortunately, this...
PDF  Example : python3 textractor.py --documents s3://mybucket/mydoc.pdf --forms Result : 62692bb61ab53-pdf-page-1-forms.csv  how can i order this way , we currently seem to hard-code a limit of 1 max pages of S3 objects when calling `S3Helper.getFileNames()` to list the objects in an S3 folder input - even...
When applying textractor to a local folder or S3 prefix with an inner folder structure, it would be really useful if output files were also mapped to the same folder...
When we use --translate, we get the translation for each page but the consolidated JSON response is -response.json not translated. How to generate the translation in the final JSON as...
If file name contains space, it's not processing. The loop is continuing with "IN PROGRESS" indefinitely. I have tested this in two environments, same behavior.