plugin-xml icon indicating copy to clipboard operation
plugin-xml copied to clipboard

Whitespace formatting isn't valid and idempotent with `ignore` sensitivity

Open gebsh opened this issue 1 year ago • 0 comments

In XML, only \t, \n, \r, and are considered whitespace and are affected by the xml:space attribute. However, when formatting an XML document with the xmlWhitespaceSensitivity option set to ignore, @prettier/plugin-xml uses String.prototype.trim() to remove whitespace characters, which results in removal of text that should be preserved.

https://github.com/prettier/plugin-xml/blob/68b3430186d6b9bfda86f683b97694492825bb3d/src/printer.js#L281-L288

For example, this document has a <text> element with 4 trailing U+00A0 No-Break Space characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo    </text>
  <text>bar</text>
</paragraph>

Formatting it removes these 4 trailing characters:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>foo</text>
  <text>bar</text>
</paragraph>

Due to this behavior, formatting of documents containing elements that only have non-breaking spaces causes the output to be different depending on how many formatting runs are performed. Given this input:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
</paragraph>

This is the output after formatting the input once:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text></text>
</paragraph>

And this is the output after formatting it twice:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text />
</paragraph>

Here's a list of affected characters:

And an XML document that has each of these characters repeated 4 times in separate <text> elements:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<paragraph>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text>



</text>
  <text>



</text>
  <text>    </text>
  <text>    </text>
  <text>    </text>
  <text></text>
</paragraph>

gebsh avatar Feb 16 '24 18:02 gebsh