amazon-textract-textractor
amazon-textract-textractor copied to clipboard
getting text from detect_document_text
How can I get the text in natural reading order (left to right) with detect_document_text with line break info?
Example image:
document.text output:
quick a brown fox
jumps over the lazy dog
word3
word4
word5 word7
word1 word2
word8word9 word10
document.lines output:
[quick a brown fox, jumps over the lazy dog, word3, word4, word5 word7, word1 word2, word8word9 word10]
document.words output:
[a, brown, the, fox, over, jumps, dog, quick, lazy, word7, word3, word4, word2, word5, word1, word8word9, word10]