amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Analyze documents with Amazon Textract and generate output in multiple formats.
There is a well known issue with python that using nargs + will mess up the calculation of positional arguments. Suggest in the documentation either 1) the optional arguments are...
I was trying to locate text with a hashtag however I was unable to until I commented out the call to make_alphanum_and_lower_for_non_numbers() in the select_text() 
Currently a user that does not have `pdf2image` installed will see an exception when calling `analyze_document` with a PDF. This is problematic because Textract does support single-page PDF as input.
Current work around for pdf is the following: ```python from pdf2image import convert_from_path from textractor.entities.document import Document # Loading the JSON response document = Document.open("output.json") # Loading the images and...
Calling the function with a filepath will yield a misleading error: ``` file_name = './Invoice_INV300351.pdf' response = call_textract(input_document=file_name , boto3_textract_client=client ) ``` error: ``` Traceback (most recent call last): File...
Per [this example](https://aws-samples.github.io/amazon-textract-textractor/notebooks/parsing_an_existing_response.html) reference: > There are two ways to parse an existing JSON. The simplest one, reminiscent of PIL.Image.open() is Document.open() which takes either a path or file-like object...
This is very similar to the #195. Since that issue has been closed, I am creating a new one: In #195, `KeyError: 'Geometry'` was addressed by creating a condition that...
upgrade the GitHub actions to the latest version in order to remove the warnings: build The following actions uses node12 which is deprecated and will be forced to run on...
Getting an error while initializing Textractor. I am passing the region_name parameter, yet I am still getting a NoRegionError from boto3. I identified the cause of this issue, in **textractor.py**,...
I tried this line: `document = extractor.get_result(job_id=job_id, TextractAPI.ANALYZE)` and I get: `[ERROR] Runtime.UserCodeSyntaxError: Syntax error in module 'lambda_function': positional argument follows keyword argument ` Also tried: ` document = extractor.get_result(job_id=job_id,...