amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Modification to the Document entity from response is not captured when using .to_trp2()

Open ThomasDelteil opened this issue 3 years ago • 1 comments

The conversion to trp2 is based on using the initial response. This does not capture the any modifications made to the entities like OCR post-processing or correction or deletion of entities. A proper converter needs to be implemented to make the library usable for post-processing in-place modifications.

This would allow workflows along the lines of:

document.pages[1].key_values = {key: value + '_edited' for key, value in document.pages[1].key_values}
document.export("document.json")

We also need to add utilities function such as

  • [ ] Merging tables
  • [ ] Adding new keys
  • [ ] Adding new queries output

ThomasDelteil avatar Nov 03 '22 00:11 ThomasDelteil