odftoolkit icon indicating copy to clipboard operation
odftoolkit copied to clipboard

LO fails to load document after saving with odftoolkit due to invalid UTF-16 entities

Open FlorianBruckner opened this issue 3 years ago • 6 comments

Xalan contains a nasty bug that produces incorrect XML entities in the output, leading to a corrupt document. E.g. this input

<text:span text:style-name="T19">𝜈</text:span>

Is changed to this when saving this document with odftoolkit:

<text:span text:style-name="T19">&#55349;&#57096;</text:span>

More information about the root cause can be found here: https://issues.apache.org/jira/browse/XALANJ-2419

As it seems unlikely that there will ever be a new Xalan release including a fix for this, one option (and that is what I have been doing now) is to replace the xalan serializer dependency with a known good version, e.g.

        <dependency>
            <groupId>org.docx4j.org.apache</groupId>
            <artifactId>xalan-serializer</artifactId>
            <version>11.0.0</version>
        </dependency>

I cannot vouch for the integrity of this package but I have verified that it actually fixes the invalid encoding.

FlorianBruckner avatar Nov 22 '21 16:11 FlorianBruckner