amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Support integrity in text spacing with prettyprint
Image shows multi-column text for which the Textract returns words with bounding box information.
Aim: Support export/pretty print retaining the spaces shown in the document i.e print digital text in multi-column format.
Example:
Conversion of text to the following format:
1 First chapter 3
1.1 Section One 3
1.2 Section Two 3
1.3 Section Three 3
2 Last chapter 5
2.1 Section One 5
22 Section Two 5
2.3 Section Three 5