plugin-xml
plugin-xml copied to clipboard
Whitespace formatting isn't valid and idempotent with `ignore` sensitivity
In XML, only \t, \n, \r, and are considered whitespace and are affected by the xml:space attribute. However, when formatting an XML document with the xmlWhitespaceSensitivity option set to ignore, @prettier/plugin-xml uses String.prototype.trim() to remove whitespace characters, which results in removal of text that should be preserved.
https://github.com/prettier/plugin-xml/blob/68b3430186d6b9bfda86f683b97694492825bb3d/src/printer.js#L281-L288
For example, this document has a <text> element with 4 trailing U+00A0 No-Break Space characters:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text>foo </text>
<text>bar</text>
</paragraph>
Formatting it removes these 4 trailing characters:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text>foo</text>
<text>bar</text>
</paragraph>
Due to this behavior, formatting of documents containing elements that only have non-breaking spaces causes the output to be different depending on how many formatting runs are performed. Given this input:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text> </text>
</paragraph>
This is the output after formatting the input once:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text></text>
</paragraph>
And this is the output after formatting it twice:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text />
</paragraph>
Here's a list of affected characters:
- U+00A0 No-Break Space
- U+1680 Ogham Space Mark
- U+2000 En Quad
- U+2001 Em Quad
- U+2002 En Space
- U+2003 Em Space
- U+2004 Three-Per-Em Space
- U+2005 Four-Per-Em Space
- U+2006 Six-Per-Em Space
- U+2007 Figure Space
- U+2008 Punctuation Space
- U+2009 Thin Space
- U+200A Hair Space
- U+2028 Line Separator
- U+2029 Paragraph Separator
- U+202F Narrow No-Break Space
- U+205F Medium Mathematical Space
- U+3000 Ideographic Space
- U+FEFF Zero Width No-Break Space
And an XML document that has each of these characters repeated 4 times in separate <text> elements:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text> </text>
<text>
</text>
<text>
</text>
<text> </text>
<text> </text>
<text> </text>
<text></text>
</paragraph>