amazon-textract-textractor Proper way of getting cell content?

Proper way of getting cell content?

Open ttruong-gilead opened this issue 1 year ago • 5 comments

The codebase has this line: https://github.com/aws-samples/amazon-textract-textractor/blob/28d6110b08a3584edc4c87022a41d12961b88688/textractor/entities/table.py#L640

to retrieve the cell content. But there's already cell.text

I tried using cell.text but notice it's inaccurate. Sometimes it gives an empty string when it shouldn't. Whereas cell.__repr__().split(">")[1][1:] gives more accurate results, although some characters are still missing.

What's going on?

Mar 14 '24 04:03 ttruong-gilead

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Proper way of getting cell content?

amazon-textract-textractor
amazon-textract-textractor copied to clipboard