odftoolkit
odftoolkit copied to clipboard
LO fails to load document after saving with odftoolkit due to invalid UTF-16 entities
Xalan contains a nasty bug that produces incorrect XML entities in the output, leading to a corrupt document. E.g. this input
<text:span text:style-name="T19">𝜈</text:span>
Is changed to this when saving this document with odftoolkit:
<text:span text:style-name="T19">��</text:span>
More information about the root cause can be found here: https://issues.apache.org/jira/browse/XALANJ-2419
As it seems unlikely that there will ever be a new Xalan release including a fix for this, one option (and that is what I have been doing now) is to replace the xalan serializer dependency with a known good version, e.g.
<dependency>
<groupId>org.docx4j.org.apache</groupId>
<artifactId>xalan-serializer</artifactId>
<version>11.0.0</version>
</dependency>
I cannot vouch for the integrity of this package but I have verified that it actually fixes the invalid encoding.