amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Proper way of getting cell content?
The codebase has this line: https://github.com/aws-samples/amazon-textract-textractor/blob/28d6110b08a3584edc4c87022a41d12961b88688/textractor/entities/table.py#L640
to retrieve the cell content. But there's already cell.text
I tried using cell.text but notice it's inaccurate. Sometimes it gives an empty string when it shouldn't.
Whereas cell.__repr__().split(">")[1][1:] gives more accurate results, although some characters are still missing.
What's going on?