amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Analyze documents with Amazon Textract and generate output in multiple formats.

Results 129 amazon-textract-textractor issues
Sort by recently updated
recently updated
newest added

Recent issues such as #121 #122 #123 showed that our current test suite is inadequate and several edge cases are missing. This issue aims to outline a plan for implementing...

enhancement

See screenshot of parsing the screenshot of the readme. ![Screen Shot 2022-10-31 at 5 57 15 PM](https://user-images.githubusercontent.com/3716307/199136025-1c54d102-262b-4847-a8e6-4897c971a694.png) I believe this block `[TBlock(geometry=TGeometry(bounding_box=TBoundingBox(width=1.0, height=0.912468671798706, left=0.0, top=0.030051277950406075)` is ignored

bug

Hi all, Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides...

enhancement

[59766-textract-table.json](https://github.com/aws-samples/amazon-textract-textractor/files/15004467/59766-textract-table.json) In the Textract output file Cell id 3f98227c-2981-4cd5-b23c-bee82e96bb54 references three words but the code below returns null words in that cell. document= Document.open("c:\\temp\\59766-textract-table.json") #query for the line id that...

bug

*Issue #, if available:* N/A *Description of changes:* Updating the return type and function doc for `start_document_text_detection`. Language is copied from `start_document_analysis`. By submitting this pull request, I confirm that...

attached the part of the pdf, which I am trying to extract. I am doing extraction using: textract_json = call_textract(input_document="s3:url", features=[Textract_Features.LAYOUT, Textract_Features.TABLES]) layout = get_text_from_layout_json(textract_json=data) the output I am getting...

question

good morning, what solution do I use with textractor to extract the cell data from the attached image and render the cell rows correctly in Excel? Is there a rows...

question

Hi team, I was surprised to find today that the below does not work in the default Python notebook kernel of a [SageMaker Studio JupyterLab space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl.html), when the notebook's IAM...

bug

I did previously raise similar reqs #19 and #20, but they got stale & closed due to inactivity... Today had first chance in a while to come back to Textractor...

enhancement

### Discussed in https://github.com/aws-samples/amazon-textract-textractor/discussions/350 Originally posted by **samwhealon** April 11, 2024 I have been playing around with this library and the original textract-response-parser. I found that TRP doesn't support returning...