Lingering serde whitespace bug
Thanks to the work of @ggodlewski, whitespace is now preserved by default according to the xml specification when deserializing with serde. I note, however, one lingering apparent inconsistency in the case of deserializing a string consisting of only whitespace.
I've isolated the issue in these contrasting passing and failing tests.
Is this intended behaviour?
The reason is in here: https://github.com/tafia/quick-xml/blob/master/src/de/mod.rs#L2792
Check the implementation of is_blank https://github.com/tafia/quick-xml/blob/master/src/de/mod.rs#L2253
I think it's intended: https://github.com/tafia/quick-xml/blob/master/src/de/mod.rs#L3368
Thanks for that context. So if I'm understanding correctly, this is actually due to the xml:space attribute not yet being implemented?
My interpretation of the xml spec (and I could be misreading this) from section 2.10 White Space Handling, is that an xml processor must pass the whitespace through to the application, and the application gets to decide how to handle the xml:space attribute. If quick-xml is a processor, shouldn't the determination of what is significant or insignificant whitespace be done downstream?
Looping in @Mingun since he authored https://github.com/tafia/quick-xml/commit/2b85835968764820d48002dc75789cdd7a0bbe42 which handles this case.
My interpretation of the xml spec (and I could be misreading this) from section 2.10 White Space Handling, is that an xml processor must pass the whitespace through to the application, and the application gets to decide how to handle the
xml:spaceattribute.
Yes, you're absolutely right. Reader is an "xml processor", but Deserializer is an "application". Currently there are no ways to configure some aspects of it, including handling of whitespace-only text events. Handling of xml:space is one of such tasks (part of #464).
Actually, this may be fixed if this line will be executed conditionally: https://github.com/tafia/quick-xml/blob/655691c7c394ba6c9dcc3c47725d9ec7ce80f094/src/de/map.rs#L276
One possible solution, replace it with this code (and fix 4 tests in serde-de-seq failed due to this):
// If we have dedicated "$text" field, whitespace-only text may be significant.
// That also means, that type with `$text` fields behaves as if its element has
// `xml:space="preserve"` attribute: you must not have pretty-print indents
// inside this element.
if !self.has_text_field {
self.skip_whitespaces()?;
}
It seems quite logical change, but I'm not sure, how many things like
<root>
<field1/>
some text
<field2/>
</root>
it will break. You will have to write this document as
<root><field1/>
some text
<field2/></root>
to be able to deserialize it with non-vec $text field.
@Mingun What do you think about the potential solution in #897? It assumes a text event that consists exclusively of whitespace and contains a line break is insignificant. A whitespace event with no line breaks is treated as significant and preserved.
All existing tests pass without modification using this heuristic.
I've revised #897 to align with your original suggestion.