amazon-textract-textractor issues

Implement smoke tests

Recent issues such as #121 #122 #123 showed that our current test suite is inadequate and several edge cases are missing. This issue aims to outline a plan for implementing...

Belval

enhancement

Visualization is not taking into account the Geometry block

See screenshot of parsing the screenshot of the readme. ![Screen Shot 2022-10-31 at 5 57 15 PM](https://user-images.githubusercontent.com/3716307/199136025-1c54d102-262b-4847-a8e6-4897c971a694.png) I believe this block `[TBlock(geometry=TGeometry(bounding_box=TBoundingBox(width=1.0, height=0.912468671798706, left=0.0, top=0.030051277950406075)` is ignored

ThomasDelteil

bug

Access Non-Axis-Aligned Bounding Boxes

2

Hi all, Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides...

zkalson

enhancement

Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output

1

[59766-textract-table.json](https://github.com/aws-samples/amazon-textract-textractor/files/15004467/59766-textract-table.json) In the Textract output file Cell id 3f98227c-2981-4cd5-b23c-bee82e96bb54 references three words but the code below returns null words in that cell. document= Document.open("c:\\temp\\59766-textract-table.json") #query for the line id that...

raidken

bug

Update function doc and return type

*Issue #, if available:* N/A *Description of changes:* Updating the return type and function doc for `start_document_text_detection`. Language is copied from `start_document_analysis`. By submitting this pull request, I confirm that...

andrewkowalik

issue with extraction, get_text_fromlayout_json function

1

attached the part of the pdf, which I am trying to extract. I am doing extraction using: textract_json = call_textract(input_document="s3:url", features=[Textract_Features.LAYOUT, Textract_Features.TABLES]) layout = get_text_from_layout_json(textract_json=data) the output I am getting...

red-sky17

question

cell content extraction error

2

good morning, what solution do I use with textractor to extract the cell data from the attached image and render the cell rows correctly in Excel? Is there a rows...

Larbo53

question

Cryptic CLI error in SageMaker Studio (and probably other role-based environments?)

1

Hi team, I was surprised to find today that the below does not work in the default Python notebook kernel of a [SageMaker Studio JupyterLab space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl.html), when the notebook's IAM...

athewsey

bug

[Feature Request] Simplified batch processing CLI

1

I did previously raise similar reqs #19 and #20, but they got stale & closed due to inactivity... Today had first chance in a while to come back to Textractor...

athewsey

enhancement

Python Support for Column Headers

### Discussed in https://github.com/aws-samples/amazon-textract-textractor/discussions/350 Originally posted by **samwhealon** April 11, 2024 I have been playing around with this library and the original textract-response-parser. I found that TRP doesn't support returning...

Belval

amazon-textract-textractor
amazon-textract-textractor copied to clipboard

Metadata

Implement smoke tests

Visualization is not taking into account the Geometry block

Access Non-Axis-Aligned Bounding Boxes

Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output

Update function doc and return type

issue with extraction, get_text_fromlayout_json function

cell content extraction error

Cryptic CLI error in SageMaker Studio (and probably other role-based environments?)

[Feature Request] Simplified batch processing CLI

Python Support for Column Headers

← Metadata

Owner

Metadata

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Metadata

← Metadata

Owner

Metadata

amazon-textract-textractor
amazon-textract-textractor copied to clipboard