amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Analyze documents with Amazon Textract and generate output in multiple formats.
Recent issues such as #121 #122 #123 showed that our current test suite is inadequate and several edge cases are missing. This issue aims to outline a plan for implementing...
See screenshot of parsing the screenshot of the readme.  I believe this block `[TBlock(geometry=TGeometry(bounding_box=TBoundingBox(width=1.0, height=0.912468671798706, left=0.0, top=0.030051277950406075)` is ignored
Hi all, Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/dg/text-location.html). It seems that Textractor only provides...
[59766-textract-table.json](https://github.com/aws-samples/amazon-textract-textractor/files/15004467/59766-textract-table.json) In the Textract output file Cell id 3f98227c-2981-4cd5-b23c-bee82e96bb54 references three words but the code below returns null words in that cell. document= Document.open("c:\\temp\\59766-textract-table.json") #query for the line id that...
*Issue #, if available:* N/A *Description of changes:* Updating the return type and function doc for `start_document_text_detection`. Language is copied from `start_document_analysis`. By submitting this pull request, I confirm that...
attached the part of the pdf, which I am trying to extract. I am doing extraction using: textract_json = call_textract(input_document="s3:url", features=[Textract_Features.LAYOUT, Textract_Features.TABLES]) layout = get_text_from_layout_json(textract_json=data) the output I am getting...
good morning, what solution do I use with textractor to extract the cell data from the attached image and render the cell rows correctly in Excel? Is there a rows...
Hi team, I was surprised to find today that the below does not work in the default Python notebook kernel of a [SageMaker Studio JupyterLab space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl.html), when the notebook's IAM...
I did previously raise similar reqs #19 and #20, but they got stale & closed due to inactivity... Today had first chance in a while to come back to Textractor...
### Discussed in https://github.com/aws-samples/amazon-textract-textractor/discussions/350 Originally posted by **samwhealon** April 11, 2024 I have been playing around with this library and the original textract-response-parser. I found that TRP doesn't support returning...