tabgenie icon indicating copy to clipboard operation
tabgenie copied to clipboard

Handling data errors

Open kasnerz opened this issue 3 years ago • 0 comments

Due to some data errors, the tables need not to be a perfect MxN rectangle.

It should be considered if these cases should be fixed with some heuristics and if not, how to handle them during export.

See e.g. example # 20 in ToTTo in which a cell containing a dash with a column span of 2 is missing in the original raw data:

table (cf. the row "Neftekhimik Nizhnekamsk") screen-2022-12-02-16-16-37

ToTTo (excerpt from the example)

[{'column_span': 1, 'is_header': False, 'row_span': 3, 'value': 'Neftekhimik Nizhnekamsk (loan)'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '2012–13'},
  {'column_span': 1, 'is_header': False, 'row_span': 2, 'value': 'Russian FNL'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '6'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
  {'column_span': 2, 'is_header': False, 'row_span': 1, 'value': ''},
// here another cell of column_span 2 is missing
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '6'},
  {'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'}],

output HTML screen-2022-12-02-16-19-34

kasnerz avatar Dec 02 '22 15:12 kasnerz