Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output

Open raidken opened this issue 1 year ago • 1 comments

59766-textract-table.json In the Textract output file Cell id 3f98227c-2981-4cd5-b23c-bee82e96bb54 references three words but the code below returns null words in that cell.

document= Document.open("c:\temp\59766-textract-table.json") #query for the line id that references that same three words #for line in document.pages[6].lines: line_list =list(filter(lambda line: line.id=="3f98227c-2981-4cd5-b23c-bee82e96bb54",document.pages[6].lines)) print (line_list[0].words)

return the three words [Operating, Segment, Information]

cell in the textract output references the same three words but the words or text returns null, incorrectly, for the cell.

table_n = document.pages[6].tables[1]

find cell and output words

for cell in table_n.table_cells: if cell.id=="c23b7b9e-7b90-42d4-ad94-41caa8931417": print(cell.words)

####return null

Apr 17 '24 01:04 raidken

I am able to reproduce the issue, could you provide the original document for that response? It would make it easier to troubleshoot.

May 06 '24 14:05 Belval

amazon-textract-textractor amazon-textract-textractor copied to clipboard

Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output

return the three words [Operating, Segment, Information]

cell in the textract output references the same three words but the words or text returns null, incorrectly, for the cell.

find cell and output words

####return null

amazon-textract-textractor
amazon-textract-textractor copied to clipboard