amazon-textract-textractor issues

ensure cell block has text element

Table elements without a text element will cause pretty printing to fail. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under...

qeternity

Exporting text+tables while maintaining layout

1

Supposed I have a document like this: ``` ``` Where a table is located between two chunks of text, and I'd like to parse the document and save the parsed...

austinmw

KeyError in get_lines_string

The trp code called by `get_lines_string()` expects the `cid` key to be in `blockMap`. If it's not, an exception is thrown in trp ___int____.py: ``` line 153, in __init__ if...

sbui-dev

GH issue #343: Added key check

1

[#343 KeyError: 'Text' - on documents with tables](https://github.com/aws-samples/amazon-textract-textractor/issues/343) Description of changes: - Added key check By submitting this pull request, I confirm that you can use, modify, copy, and redistribute...

dzmitry-kankalovich

S3 path parsing for textractcaller is not robust enough

This line (and several others similar to this in the same file) https://github.com/aws-samples/amazon-textract-textractor/blob/7f16fa74a6ab2f5b1a322c4c5c915266361deecf/caller/textractcaller/t_call.py#L579 Could potentially break s3 path's like ``` s3://bucket-name/path/to/s3://another/path/to/file.pdf ``` Yes this is apparently valid, and S3 has...

anjanvb

KeyError: 'Text' - on documents with tables

1

Hello, I have a fairly normal looking document (for which I unfortunately cannot share original file as its a proprietary doc) that `textractprettyprinter.t_pretty_print.get_text_from_layout_json` fails to parse with `KeyError: 'Text'`. We've...

dzmitry-kankalovich

InvalidParameterException: Request has invalid parameters when using startDocumentAnalysis

**Description** I'm encountering an InvalidParameterException: Request has invalid parameters error when attempting to use the startDocumentAnalysis method with AWS Textract in a Node.js application. The error occurs despite ensuring that...

arunsingh28

issue regarding .to_markdown() method

4

since the new version release 1.8.0 we are not able to use the method .to_markdown() method. The workflow we use is as follows (mainly used for pdfs): - create json...

red-sky17

Use module name for logger instead of Root Logger

7

Typically, it's best practice for Python logging to use `logging.getLogger(__name__)`. However, the ResponseParser simply does `import logging` and then `logging.info(...)` - this results in the root logger being used, as...

michaelshum321

enhancement

Is search_words() broken?

2

**amazon-textract-textractor==1.7.9** `document.search_words(keyword="Tom Brady")` or `page.search_words(keyword="Frank")` doesn't work as expected. Returns a list of random letters or words not even close to keywords. Tried playing with the similarity_threshold to no avail.

ttruong-gilead

amazon-textract-textractor
amazon-textract-textractor copied to clipboard

Metadata

ensure cell block has text element

Exporting text+tables while maintaining layout

KeyError in get_lines_string

GH issue #343: Added key check

S3 path parsing for textractcaller is not robust enough

KeyError: 'Text' - on documents with tables

InvalidParameterException: Request has invalid parameters when using startDocumentAnalysis

issue regarding .to_markdown() method

Use module name for logger instead of Root Logger

Is search_words() broken?

← Metadata

Owner

Metadata

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Metadata

← Metadata

Owner

Metadata

amazon-textract-textractor
amazon-textract-textractor copied to clipboard