tabgenie
tabgenie copied to clipboard
Handling data errors
Due to some data errors, the tables need not to be a perfect MxN rectangle.
It should be considered if these cases should be fixed with some heuristics and if not, how to handle them during export.
See e.g. example # 20 in ToTTo in which a cell containing a dash with a column span of 2 is missing in the original raw data:
table (cf. the row "Neftekhimik Nizhnekamsk")

ToTTo (excerpt from the example)
[{'column_span': 1, 'is_header': False, 'row_span': 3, 'value': 'Neftekhimik Nizhnekamsk (loan)'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '2012–13'},
{'column_span': 1, 'is_header': False, 'row_span': 2, 'value': 'Russian FNL'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '6'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'},
{'column_span': 2, 'is_header': False, 'row_span': 1, 'value': ''},
// here another cell of column_span 2 is missing
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '6'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '0'}],
output HTML
