amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Analyze documents with Amazon Textract and generate output in multiple formats.

Results 129 amazon-textract-textractor issues
Sort by recently updated
recently updated
newest added

The page number is overwritten if you pass it to the function within the for loop. Plus the page number is not considered as search criteria. [Source Code Snipped from...

It looks like the only way to capture the output of amazon-textract is to redirect it into a file. Such as: amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LINES > 2022-04-16-0010.txt Unfortunately, this...

PDF ![Captura de pantalla de 2022-05-05 15-38-43](https://user-images.githubusercontent.com/8030118/166937006-b560ace3-071d-4b15-81e7-555e32aba8ce.png) Example : python3 textractor.py --documents s3://mybucket/mydoc.pdf --forms Result : 62692bb61ab53-pdf-page-1-forms.csv ![Captura de pantalla de 2022-05-05 16-00-35](https://user-images.githubusercontent.com/8030118/166939976-90322dc4-d90a-479d-bb52-d5e19f82c8da.png) how can i order this way ![Captura...

After run `python -m pip install amazon-textract-helper` It creates a file named "amazon-textract" at `%LOCALAPPDATA%\Programs\Python\Python38\Scripts` Note that is named "amazon-textract" not "amazon-textract.py", so windows 10 don't know how execute it...

I am planning to use Comprehend Medical in production in a new biomedical research product we are working on. I used Textractor to process an 1143 page pdf of a...

Thanks for the nice utility! However, my working directory is now an absolute mess 😂 It would be really helpful if something like an `--output` CLI option was available where...

[In textractor.py](https://github.com/aws-samples/amazon-textract-textractor/blob/ea5019475bb71b2adb1ad880f8d48b0f2b4e932f/src/textractor.py#L65), we currently seem to hard-code a limit of 1 max pages of S3 objects when calling `S3Helper.getFileNames()` to list the objects in an S3 folder input - even...

When applying textractor to a local folder or S3 prefix with an inner folder structure, it would be really useful if output files were also mapped to the same folder...

When we use --translate, we get the translation for each page but the consolidated JSON response is -response.json not translated. How to generate the translation in the final JSON as...

enhancement

If file name contains space, it's not processing. The loop is continuing with "IN PROGRESS" indefinitely. I have tested this in two environments, same behavior.