odftoolkit
odftoolkit copied to clipboard
Parsing ODF files generated by Excel
Excel seems to generate ODS files in a way which confuses this library (0.9.0-RC1). LibreOffice can import this ODS file just fine. made-with-excel.ods.zip
Opening this file with odftoolkit and calling OdfTableRow.getCellCount() on its rows seems to be running in an infinite loop.
The DOM implementation should not try to expand the XML to a table without repeated. ;-)
<table:table-row table:number-rows-repeated="1048573" table:style-name="ro1">
<table:table-cell table:number-columns-repeated="16384"/>
</table:table-row>
It seems that this issue is currently preventing to open any ODS file from Libre Office 7.3 (stack Ubuntu 22.04). Is there any workaround? Or maybe could somebody point to the class where we could fix it?
I debugged this a bit and it seems that while trying to generate the coverList in OdfTableRow.getCellCount() that if there are billions of empty cells because of table:number-rows-repeated="1048543" times the table:table-cell table:number-columns-repeated="1024" then it just takes a really really long time.
Perhaps there could be a check (or parameter option, or new method getRealRealCellCount()) that discards empty rows and thus skip trying to get the billions of Cell Cover Infos for them?
https://github.com/tdf/odftoolkit/blob/36ef92c29584f445a9e228b7b3cda6142a389fb1/odfdom/src/main/java/org/odftoolkit/odfdom/doc/table/OdfTableRow.java#L289-L295
It's also interesting that Apache Hop stops checking at the first non-empty row and checks if rows have non-empty cells, etc. https://github.com/apache/hop/blob/main/plugins/transforms/excel/src/main/java/org/apache/hop/pipeline/transforms/excelinput/ods/OdfSheet.java
Any repeated number should not be iterated but be "added"/"embraced" to the internal model. In the following, the ColumnsRepeatedNumber was added, but the Rows RepeatedNumber is looped. Looks like a bug to me!
https://github.com/tdf/odftoolkit/blob/36ef92c29584f445a9e228b7b3cda6142a389fb1/odfdom/src/main/java/org/odftoolkit/odfdom/doc/table/OdfTable.java#L2286-L2298
I would greatly appreciate it if you could help stop this, Thad! 👍
@svanteschubert Sure, but also go ahead and label this issue as "help wanted". It attracts folks willing to dive in and give a hand.
Found terrific background information in PR #285 to help tackle this issue.