amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output
59766-textract-table.json In the Textract output file Cell id 3f98227c-2981-4cd5-b23c-bee82e96bb54 references three words but the code below returns null words in that cell.
document= Document.open("c:\temp\59766-textract-table.json") #query for the line id that references that same three words #for line in document.pages[6].lines: line_list =list(filter(lambda line: line.id=="3f98227c-2981-4cd5-b23c-bee82e96bb54",document.pages[6].lines)) print (line_list[0].words)
return the three words [Operating, Segment, Information]
cell in the textract output references the same three words but the words or text returns null, incorrectly, for the cell.
table_n = document.pages[6].tables[1]
find cell and output words
for cell in table_n.table_cells: if cell.id=="c23b7b9e-7b90-42d4-ad94-41caa8931417": print(cell.words)
####return null
I am able to reproduce the issue, could you provide the original document for that response? It would make it easier to troubleshoot.