amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Analyze documents with Amazon Textract and generate output in multiple formats.

Results 129 amazon-textract-textractor issues
Sort by recently updated
recently updated
newest added

Hello, this issue seems very similar to #136 , but I just can't make it work: the word and line order inside table cells is not preserved when invoking the...

bug
enhancement

https://github.com/aws-samples/amazon-textract-textractor/blob/9df5d268dead3f42104cde2f766cb16be3f93d95/textractor/entities/expense_field.py#L184-L189 The above code throws an `AttributeError` when the `bbox` on `self.expenses[0]` is None. Since `spatial_object` can be None, would it make sense to add some protection here?

Hello, We are getting this error: `AttributeError: 'NoneType' object has no attribute 'spatial_object'` ![image](https://github.com/user-attachments/assets/62d277fb-f392-4b54-bed2-eedf82513de9) Here are the sample images: ![2410233880_1](https://github.com/user-attachments/assets/42bc9fef-f460-4aea-ad93-3f0554aca545) ![2410233852_2](https://github.com/user-attachments/assets/e0164c40-ec1d-4487-bed7-6ff9a51e64a9)

bug

Hi team, looking for some support on the lambda layers github action. we use these layers in our app, but the current builds are red and previous artifacts have expired....

Entity Types are occasionally `None`, causing the linearize layout to fail. This may happen in cases where there are multiple page documents. *Issue #, if available:* *Description of changes:* Check...

The invoice number was not detected. We assume this is because there is no space between the label and the value. This issue occurs in both the Amazon Textract console...

# Current Behavior While trying to create markdown or text files from AWS Textract JSON output using the `get_text_from_layout_json` function, the contents of **_ALL_** the list items are duplicated in...

*Issue #, if available:* https://github.com/aws-samples/amazon-textract-textractor/issues/391 *Description of changes:* ### What? Prevent duplication of list contents. ### How? Exclude all `LAYOUT*` elements which are children of `LIST_LAYOUT` elements when returning layout...

pretty-printer

The BoundingBox docstring uses some latex style `\in`, but python sees the `\i` as an invalid escape and issues a SyntaxWarning: ``` $ python -Werror -c 'import textractor' Traceback (most...

the attached input document contains text then a table followed by some text, we want the text file to be the same as the input pdf file. ![input_page](https://github.com/user-attachments/assets/fe09d250-6547-4eff-bc8f-854f9316b28b) I tried...