amazon-textract-response-parser icon indicating copy to clipboard operation
amazon-textract-response-parser copied to clipboard

Issue #55: fix Table.rows_without_header function adding duplicate no…

Open cleung11 opened this issue 3 years ago • 1 comments

https://github.com/aws-samples/amazon-textract-response-parser/issues/55

The function checks if a row is not a header and appends the row within the cell for loop (adds a row for each cell). It should be moved one level out into the row for loop instead: https://github.com/aws-samples/amazon-textract-response-parser/blob/master/src-python/trp/init.py#L431

Original:

    @property
    def rows_without_header(self) -> List[Row]:
        non_header_rows: List[Row] = list()
        for row in self.rows:
            header = False
            for cell in row.cells:
                for entity_type in cell.entityTypes:
                    if entity_type == ENTITY_TYPE_COLUMN_HEADER:
                        header = True
                if not header:
                    non_header_rows.append(row)
        return non_header_rows

New:

    @property
    def rows_without_header(self) -> List[Row]:
        non_header_rows: List[Row] = list()
        for row in self.rows:
            header = False
            for cell in row.cells:
                for entity_type in cell.entityTypes:
                    if entity_type == ENTITY_TYPE_COLUMN_HEADER:
                        header = True
            if not header: # moved this left one tab
                non_header_rows.append(row) # moved this left one tab
        return non_header_rows

cleung11 avatar Feb 25 '22 21:02 cleung11

@schadem @athewsey would it be possible to merge the PR in the near future?

tb102122 avatar Jun 15 '22 22:06 tb102122