ezodf icon indicating copy to clipboard operation
ezodf copied to clipboard

ODS sheets: Wrong results with sparse tables

Open hvbtup opened this issue 9 years ago • 4 comments

At least the following function return wrong results when I create a sheet that contains data only in the cells (for example) G1, Z1, CD1, CE1: sheet.ncols() sheet["CD1"].value and many more. It seems as if the table:number-columns-repeated attribute is simply ignored.

hvbtup avatar Nov 26 '15 10:11 hvbtup

I think the reason is the logic in tablenormalizer.py, class _ExpandAllLessMaxCount, method expand_cell. The elif condition seems wrong. OTOH the class name says exactly what is happening. This makes me think that using this class is wrong. In my case, maxcols is less than the table:number-columns-repeated attribute.

I could work around this problem by calling

ezodf.conf.config.set_table_expand_strategy('all_less_maxcount', (100, 100))

Anyway, if the elif clause in expand_cell is called, the xml attribute is removed (and information gets lost) while the cell is treated as repeating-1. This in turn causes the wrong result for ncols(), accessing columns and so on. And then it's not even possible to work around this by directly looking at the XML node (as I tried), because the table:number-columns-repeated attribute has been removed silently.

I propose to raise an Exception in this case or to mark the sheet and the row as corrupted (raising an exception if accessing cells by position is tried). And the XML attribute should not be removed in this case, allowing the developer to examine it himself.

hvbtup avatar Nov 26 '15 12:11 hvbtup

@T0ha Are you planning on doing anything with that? I just encountered this bug and it is especially bad as it silently gives completely wrong data.

I almost failed to notice it.

At least pyexcel-ods and odfpy clearly crashed, silently giving bad data is much worse.

matkoniecz avatar Feb 09 '16 21:02 matkoniecz

Not sure, but maybe this is related: https://github.com/frictionlessdata/tabulator-py/pull/114/files#r85104563

I got too many extra data filled with None values.

sirex avatar Oct 26 '16 11:10 sirex

Sorry, issue I posted few hours ago is not related. It turned out, that if cells have a formatting applied, they are considered non-empty, even if they don't contain any data. When I created completely new document and pasted to it only part containing data, issue gone.

sirex avatar Oct 26 '16 16:10 sirex