odftoolkit icon indicating copy to clipboard operation
odftoolkit copied to clipboard

Parsing ODF files generated by Excel

Open wetneb opened this issue 5 years ago • 7 comments

Excel seems to generate ODS files in a way which confuses this library (0.9.0-RC1). LibreOffice can import this ODS file just fine. made-with-excel.ods.zip

Opening this file with odftoolkit and calling OdfTableRow.getCellCount() on its rows seems to be running in an infinite loop.

wetneb avatar Dec 13 '19 10:12 wetneb

The DOM implementation should not try to expand the XML to a table without repeated. ;-)

                <table:table-row table:number-rows-repeated="1048573" table:style-name="ro1">
                    <table:table-cell table:number-columns-repeated="16384"/>
                </table:table-row>

svanteschubert avatar Oct 20 '20 21:10 svanteschubert

It seems that this issue is currently preventing to open any ODS file from Libre Office 7.3 (stack Ubuntu 22.04). Is there any workaround? Or maybe could somebody point to the class where we could fix it?

kcech avatar Aug 04 '23 18:08 kcech

I debugged this a bit and it seems that while trying to generate the coverList in OdfTableRow.getCellCount() that if there are billions of empty cells because of table:number-rows-repeated="1048543" times the table:table-cell table:number-columns-repeated="1024" then it just takes a really really long time.

Perhaps there could be a check (or parameter option, or new method getRealRealCellCount()) that discards empty rows and thus skip trying to get the billions of Cell Cover Infos for them? https://github.com/tdf/odftoolkit/blob/36ef92c29584f445a9e228b7b3cda6142a389fb1/odfdom/src/main/java/org/odftoolkit/odfdom/doc/table/OdfTableRow.java#L289-L295

thadguidry avatar Feb 04 '25 08:02 thadguidry

It's also interesting that Apache Hop stops checking at the first non-empty row and checks if rows have non-empty cells, etc. https://github.com/apache/hop/blob/main/plugins/transforms/excel/src/main/java/org/apache/hop/pipeline/transforms/excelinput/ods/OdfSheet.java

thadguidry avatar Feb 04 '25 08:02 thadguidry

Any repeated number should not be iterated but be "added"/"embraced" to the internal model. In the following, the ColumnsRepeatedNumber was added, but the Rows RepeatedNumber is looped. Looks like a bug to me!

https://github.com/tdf/odftoolkit/blob/36ef92c29584f445a9e228b7b3cda6142a389fb1/odfdom/src/main/java/org/odftoolkit/odfdom/doc/table/OdfTable.java#L2286-L2298

I would greatly appreciate it if you could help stop this, Thad! 👍

svanteschubert avatar Feb 04 '25 09:02 svanteschubert

@svanteschubert Sure, but also go ahead and label this issue as "help wanted". It attracts folks willing to dive in and give a hand.

thadguidry avatar Feb 04 '25 10:02 thadguidry

Found terrific background information in PR #285 to help tackle this issue.

thadguidry avatar Feb 04 '25 12:02 thadguidry