xstream icon indicating copy to clipboard operation
xstream copied to clipboard

`PrettyPrintWriter` fails to serialize characters in the Unicode Supplementary Multilingual Plane in XML 1.0 mode and XML 1.1 mode

Open basil opened this issue 2 years ago • 3 comments

PrettyPrintWriter fails to properly serialize characters in the Unicode Supplementary Multilingual Plane (SMP) in XML 1.0 mode and XML 1.1 mode (quirks mode works) with the following exception:

com.thoughtworks.xstream.io.StreamException: Invalid character 0xd83e in XML stream
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.writeText(PrettyPrintWriter.java:250)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.writeText(PrettyPrintWriter.java:205)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.setValue(PrettyPrintWriter.java:187)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriterTest.testSupportsSupplementaryMultilingualPlaneInXml1_0Mode(PrettyPrintWriterTest.java:310)

The root cause of the problem is incorrect iteration over Unicode code points. The current implementation iterates over the UTF-16 representation of the characters rather than iterating over each code point. Characters in the Supplementary Multilingual Plane are encoded in UTF-16 as two digits. For example U+1F98A is encoded in UTF-16 as 0xD83E 0xDD8A. Java provides a dedicated API to iterate over code points, but XStream makes the erroneous assumption that a code point and a character are equivalent, likely because it was never tested outside of quirks mode with characters in the Supplementary Multilingual Plane. This PR fixes the problem by using the Java API for iterating over code points, thus removing the faulty assumption that a code point and a character are equivalent.

The new quirks mode test passes before and after the changes to PrettyPrintWriter. The new XML 1.0 mode and XML 1.1 mode tests fail before the changes to PrettyPrintWriter with the exception given above. The new XML 1.0 mode and XML 1.0 mode tests pass after the changes to PrettyPrintWriter.

Fixes #336

basil avatar May 02 '23 23:05 basil

Why is this not assigned to the 1.4 milestone? This is a critical bug fix that we want in 1.4.

basil avatar May 03 '23 20:05 basil

Because 1.5.x is dropping compatibility to Java 10 to 1.4.

joehni avatar May 13 '23 21:05 joehni

I think it would make more sense for the 1.4.x line to require Java 8 or newer or to backport this fix to the 1.4.x line with a for-loop based implementation that can run on Java 7 or earlier.

basil avatar May 15 '23 20:05 basil

Any plans to merge this PR and release version 1.5.x requiring Java 11 or newer?

basil avatar Nov 08 '24 03:11 basil

Sorry for the long delay...

joehni avatar Nov 11 '24 19:11 joehni