amazon-textract-response-parser
amazon-textract-response-parser copied to clipboard
Issue #55: fix Table.rows_without_header function adding duplicate no…
https://github.com/aws-samples/amazon-textract-response-parser/issues/55
The function checks if a row is not a header and appends the row within the cell for loop (adds a row for each cell). It should be moved one level out into the row for loop instead: https://github.com/aws-samples/amazon-textract-response-parser/blob/master/src-python/trp/init.py#L431
Original:
@property
def rows_without_header(self) -> List[Row]:
non_header_rows: List[Row] = list()
for row in self.rows:
header = False
for cell in row.cells:
for entity_type in cell.entityTypes:
if entity_type == ENTITY_TYPE_COLUMN_HEADER:
header = True
if not header:
non_header_rows.append(row)
return non_header_rows
New:
@property
def rows_without_header(self) -> List[Row]:
non_header_rows: List[Row] = list()
for row in self.rows:
header = False
for cell in row.cells:
for entity_type in cell.entityTypes:
if entity_type == ENTITY_TYPE_COLUMN_HEADER:
header = True
if not header: # moved this left one tab
non_header_rows.append(row) # moved this left one tab
return non_header_rows
@schadem @athewsey would it be possible to merge the PR in the near future?