amazon-textract-textractor issues

export table to json

6

I can see support for exporting to csv, Pandas DataFrame and xls here: https://aws-samples.github.io/amazon-textract-textractor/notebooks/table_data_to_various_formats.html Is there a way to export the table data to json? Or do I have to...

bvbg1

Export of edited document object in API format added

1

*Issue #, if available:* [#165](https://github.com/aws-samples/amazon-textract-textractor/issues/165) *Description of changes:* Previously, customers had the ability to convert the API response JSON to Document object to edit its contents. Support has now been...

rasrivid

Add Page to DocumentEntity

*Issue #, if available:* #170 *Description of changes:* The original issue was that word and line bounding boxes were shifted in some cases when page width or page height !=...

Belval

start_document_analysis does not support List of Images

2

start_document_analysis in the documentation says it supports a list of PIL images, but in the source code https://github.com/aws-samples/amazon-textract-textractor/blob/e40f5b0378f9ee24d0a757de414505fb06a4471f/textractor/textractor.py#L488 it only accepts a string, a bytearray, or a PIL Image. How...

tarunn2799

enhancement

Improve AnalyzeExpense Support

Currently there is limited support for AnalyzeExpense in Textractor. We support sync and async API calls. However we need to implement the following: - [x] Allow duplication of KV for...

ThomasDelteil

Re-export edited entities to JSON

2

As a prior to the geofinder feature, we need the ability to export edited entities back into the Textract API response format. In the image below, the customer would be...

rasrivid

Fix gaps in current and new version of GeoFinder

1

Current Implementation: Task: - [ ] Get feature requirements from stakeholders

rasrivid

Modification to the Document entity from response is not captured when using .to_trp2()

1

The conversion to trp2 is based on using the initial response. This does not capture the any modifications made to the entities like OCR post-processing or correction or deletion of...

ThomasDelteil

enhancement

Support integrity in text spacing with prettyprint

Image shows multi-column text for which the Textract returns words with bounding box information. Aim: Support export/pretty print retaining the spaces shown in the document i.e print digital text in...

rasrivid

Add pre-processing library to improve final results

It is often possible to improve results of the final processing by performing adjustements on the input image. We want to provide a helper library such that it is easy...

ThomasDelteil

amazon-textract-textractor
amazon-textract-textractor copied to clipboard

Metadata

export table to json

Export of edited document object in API format added

Add Page to DocumentEntity

start_document_analysis does not support List of Images

Improve AnalyzeExpense Support

Re-export edited entities to JSON

Fix gaps in current and new version of GeoFinder

Modification to the Document entity from response is not captured when using .to_trp2()

Support integrity in text spacing with prettyprint

Add pre-processing library to improve final results

← Metadata

Owner

Metadata

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Metadata

← Metadata

Owner

Metadata

amazon-textract-textractor
amazon-textract-textractor copied to clipboard