amazon-textract-textractor issues

documentation for command line should be updated for nargs + positional arguments

There is a well known issue with python that using nargs + will mess up the calculation of positional arguments. Suggest in the documentation either 1) the optional arguments are...

multifactorid

documentation

parameterize make_alphanum_and_lower_for_non_numbers() in find_phrase_on_page()

I was trying to locate text with a hashtag however I was unable to until I commented out the call to make_alphanum_and_lower_for_non_numbers() in the select_text() ![image](https://github.com/aws-samples/amazon-textract-textractor/assets/68277405/1d09c093-4c37-4991-b3b5-34c1cda8cead)

grantrosse

Downgrade exception to warning when passing PDF to analyze_document

Currently a user that does not have `pdf2image` installed will see an exception when calling `analyze_document` with a PDF. This is problematic because Textract does support single-page PDF as input.

Belval

Enhancement: Allow json parser to also set the images by passing the original document

Current work around for pdf is the following: ```python from pdf2image import convert_from_path from textractor.entities.document import Document # Loading the JSON response document = Document.open("output.json") # Loading the images and...

ThomasDelteil

textractcaller - Allow local file path input for PDF

Calling the function with a filepath will yield a misleading error: ``` file_name = './Invoice_INV300351.pdf' response = call_textract(input_document=file_name , boto3_textract_client=client ) ``` error: ``` Traceback (most recent call last): File...

grantrosse

Loading Existing JSON Files from S3

5

Per [this example](https://aws-samples.github.io/amazon-textract-textractor/notebooks/parsing_an_existing_response.html) reference: > There are two ways to parse an existing JSON. The simplest one, reminiscent of PIL.Image.open() is Document.open() which takes either a path or file-like object...

ccrosland

KeyError: 'Geometry' raised when an empty cell found in `response_parser.py`

1

This is very similar to the #195. Since that issue has been closed, I am creating a new one: In #195, `KeyError: 'Geometry'` was addressed by creating a condition that...

mhfarahani

upgrade GitHub Action versions

upgrade the GitHub actions to the latest version in order to remove the warnings: build The following actions uses node12 which is deprecated and will be forced to run on...

tb102122

Issue with creating Textract client: NoRegionError thrown

Getting an error while initializing Textractor. I am passing the region_name parameter, yet I am still getting a NoRegionError from boto3. I identified the cause of this issue, in **textractor.py**,...

alanmohan

how to use get_result?

2

I tried this line: `document = extractor.get_result(job_id=job_id, TextractAPI.ANALYZE)` and I get: `[ERROR] Runtime.UserCodeSyntaxError: Syntax error in module 'lambda_function': positional argument follows keyword argument ` Also tried: ` document = extractor.get_result(job_id=job_id,...

bvbg1

amazon-textract-textractor
amazon-textract-textractor copied to clipboard

Metadata

documentation for command line should be updated for nargs + positional arguments

parameterize make_alphanum_and_lower_for_non_numbers() in find_phrase_on_page()

Downgrade exception to warning when passing PDF to analyze_document

Enhancement: Allow json parser to also set the images by passing the original document

textractcaller - Allow local file path input for PDF

Loading Existing JSON Files from S3

KeyError: 'Geometry' raised when an empty cell found in `response_parser.py`

upgrade GitHub Action versions

Issue with creating Textract client: NoRegionError thrown

how to use get_result?

← Metadata

Owner

Metadata

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Metadata

← Metadata

Owner

Metadata

amazon-textract-textractor
amazon-textract-textractor copied to clipboard