amazon-textract-textractor Modification to the Document entity from response is not captured when using .to

Modification to the Document entity from response is not captured when using .to_trp2()

Open ThomasDelteil opened this issue 3 years ago • 1 comments

The conversion to trp2 is based on using the initial response. This does not capture the any modifications made to the entities like OCR post-processing or correction or deletion of entities. A proper converter needs to be implemented to make the library usable for post-processing in-place modifications.

This would allow workflows along the lines of:

document.pages[1].key_values = {key: value + '_edited' for key, value in document.pages[1].key_values}
document.export("document.json")

We also need to add utilities function such as

[ ] Merging tables
[ ] Adding new keys
[ ] Adding new queries output

Nov 03 '22 00:11 ThomasDelteil

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Modification to the Document entity from response is not captured when using .to_trp2()

amazon-textract-textractor
amazon-textract-textractor copied to clipboard