amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Analyze documents with Amazon Textract and generate output in multiple formats.

Results 129 amazon-textract-textractor issues
Sort by recently updated
recently updated
newest added

There is a well known issue with python that using nargs + will mess up the calculation of positional arguments. Suggest in the documentation either 1) the optional arguments are...

documentation

I was trying to locate text with a hashtag however I was unable to until I commented out the call to make_alphanum_and_lower_for_non_numbers() in the select_text() ![image](https://github.com/aws-samples/amazon-textract-textractor/assets/68277405/1d09c093-4c37-4991-b3b5-34c1cda8cead)

Currently a user that does not have `pdf2image` installed will see an exception when calling `analyze_document` with a PDF. This is problematic because Textract does support single-page PDF as input.

Current work around for pdf is the following: ```python from pdf2image import convert_from_path from textractor.entities.document import Document # Loading the JSON response document = Document.open("output.json") # Loading the images and...

Calling the function with a filepath will yield a misleading error: ``` file_name = './Invoice_INV300351.pdf' response = call_textract(input_document=file_name , boto3_textract_client=client ) ``` error: ``` Traceback (most recent call last): File...

Per [this example](https://aws-samples.github.io/amazon-textract-textractor/notebooks/parsing_an_existing_response.html) reference: > There are two ways to parse an existing JSON. The simplest one, reminiscent of PIL.Image.open() is Document.open() which takes either a path or file-like object...

This is very similar to the #195. Since that issue has been closed, I am creating a new one: In #195, `KeyError: 'Geometry'` was addressed by creating a condition that...

upgrade the GitHub actions to the latest version in order to remove the warnings: build The following actions uses node12 which is deprecated and will be forced to run on...

Getting an error while initializing Textractor. I am passing the region_name parameter, yet I am still getting a NoRegionError from boto3. I identified the cause of this issue, in **textractor.py**,...

I tried this line: `document = extractor.get_result(job_id=job_id, TextractAPI.ANALYZE)` and I get: `[ERROR] Runtime.UserCodeSyntaxError: Syntax error in module 'lambda_function': positional argument follows keyword argument ` Also tried: ` document = extractor.get_result(job_id=job_id,...