amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Modification to the Document entity from response is not captured when using .to_trp2()
The conversion to trp2 is based on using the initial response. This does not capture the any modifications made to the entities like OCR post-processing or correction or deletion of entities. A proper converter needs to be implemented to make the library usable for post-processing in-place modifications.
This would allow workflows along the lines of:
document.pages[1].key_values = {key: value + '_edited' for key, value in document.pages[1].key_values}
document.export("document.json")
We also need to add utilities function such as
- [ ] Merging tables
- [ ] Adding new keys
- [ ] Adding new queries output