amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Analyze documents with Amazon Textract and generate output in multiple formats.
Table elements without a text element will cause pretty printing to fail. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under...
Supposed I have a document like this: ``` ``` Where a table is located between two chunks of text, and I'd like to parse the document and save the parsed...
The trp code called by `get_lines_string()` expects the `cid` key to be in `blockMap`. If it's not, an exception is thrown in trp ___int____.py: ``` line 153, in __init__ if...
[#343 KeyError: 'Text' - on documents with tables](https://github.com/aws-samples/amazon-textract-textractor/issues/343) Description of changes: - Added key check By submitting this pull request, I confirm that you can use, modify, copy, and redistribute...
This line (and several others similar to this in the same file) https://github.com/aws-samples/amazon-textract-textractor/blob/7f16fa74a6ab2f5b1a322c4c5c915266361deecf/caller/textractcaller/t_call.py#L579 Could potentially break s3 path's like ``` s3://bucket-name/path/to/s3://another/path/to/file.pdf ``` Yes this is apparently valid, and S3 has...
Hello, I have a fairly normal looking document (for which I unfortunately cannot share original file as its a proprietary doc) that `textractprettyprinter.t_pretty_print.get_text_from_layout_json` fails to parse with `KeyError: 'Text'`. We've...
**Description** I'm encountering an InvalidParameterException: Request has invalid parameters error when attempting to use the startDocumentAnalysis method with AWS Textract in a Node.js application. The error occurs despite ensuring that...
since the new version release 1.8.0 we are not able to use the method .to_markdown() method. The workflow we use is as follows (mainly used for pdfs): - create json...
Typically, it's best practice for Python logging to use `logging.getLogger(__name__)`. However, the ResponseParser simply does `import logging` and then `logging.info(...)` - this results in the root logger being used, as...
**amazon-textract-textractor==1.7.9** `document.search_words(keyword="Tom Brady")` or `page.search_words(keyword="Frank")` doesn't work as expected. Returns a list of random letters or words not even close to keywords. Tried playing with the similarity_threshold to no avail.