quick-xml
quick-xml copied to clipboard
Disable `trim_text` in Deserializer from_reader
Is there a easy way to set trim_text to false in the Deserializer::from_str
when i use quick_xml::de::from_str
?
https://github.com/tafia/quick-xml/blob/a4be48431a2c512683538b81d7ee037af70d5cce/src/de/mod.rs#L160-L167
Is there a way to determine whether set trim_text by looking if there is xml:space = "preserve"? For example:
<t xml:space="preserve">Text </t>
The trailing space here should not be trimmed.
There are very little customization on the serde deserializer so far. I don't think there is any major blocking point, someone just needs to write it.
In the coming release Deserializer::new
would be public and you could create a deserializer from a Reader
(but do not turn off expand_empty_elements
! For now Deserializer
is not prepared for that).
Processing of xml:space
still waits its own PR
In the coming release
Deserializer::new
would be public
Was this implemented already? Deserializer::new
is public in the latest version, but it seems useless since XmlRead
can't be implemented outside of quick_xml and SliceReader
and IoReader
can't be instantiated also. There's no way to use Deserializer::new
, unless I'm missing something here.
Yes, this is oversight. So, currently this is still not possible, even in master
. Need to think about better API. I also would to provide an API to create a deserializer for a part of XML, so you can mix usual Reader
usage with the Deserializer
usage, for example, to support streaming deserialization.
According to the original use case -- I do not think that simply disabling trim_text
would be usable -- it seems that you'll just break deserialization of pretty-printed XMLs at all with that setting
Yeah, that's exactly what happened. I didn't get why that option, even though internal, exists. I've sadly moved back to serde_xml_rs since they give an option to not trim and I'm not willing to spend more time debugging xml deserialization right now. I'll be trying quick_xml in the future in case it gets more versatility
The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements. It should be possible (and probably the default) to ignore the latter without affecting the text contents of elements themselves.
Having an option for trimming spaces around text contents is nice of course, but not at all necessary (the user could easily do this themselves) and as this issue points out it is more difficult to do "correctly" than originally envisioned. Maybe we should eliminate this feature and just keep the "ignore spaces between XML elements" functionality?
The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements.
Yes, I think, we should move in that direction. A couple of thoughts:
- need to take into account a #383 problem. I think, it can be solved by introducing a method to read all content as a string regardless of the XML markup inside:
impl Reader { /// For XML /// /// <outer> <inner/> </outer> /// /// - can be called after BytesStart("outer") /// - returns " <inner/> " /// - consumes BytesEnd("outer") fn read_as_text(&mut self, end: QName) -> Result<Cow<str>> { ... } } // or maybe better (except for a long name :( ) impl BytesStart { fn read_to_end_as_text(&self, reader: &mut Reader) -> Result<Cow<str>> { ... } }
- we need a lookahead to decide if whitespace is significant or not (==determine the shape of the next tag -- opening/closing/self-closing/comment/PI/CData?) -- #414 is related. Should we, for example, allow XML comments in whitespace-significant parts (
<outer> <!----> </outer>
)? If yes, that would require a potentially infinity lookahead
Hello, little update here, since Deserializer::new
still is not public we can't have a from_str
or any other variation where it does not trim the content. So I can work on a quick PR for that. But you already discussed about that, so have you a preferred solution ? I've thought about juste adding a from_reader
implementation that takes a Reader
(not something that implements IoReader
) so that you can change the reader's attributes without having access to the Deserializer. This function will just force the attribute expand_empty_elements
since you said it is required for the moment.
Just disabling trim_text
will not work correctly for pretty-printed XMLs, therefore I doubt that so limited implementation would be useful in mass. Actually, the trim_text*
options should not exist at that level of parsing -- it is just wrong place to do trim. I'm working on proper trim implementation in #520 and I plan to implement it in 0.28 which is probably would be released 2-3 months later. After that probably this problem will gone (but maybe not).
Well, I think, that we could add a Deserializer::trim_text(trim: bool)
method as a temporary solution, with a proper warning that it will not work for XMLs with pretty-printed parts.
I've just created a #572. When it would be merged, we could change the content of an introduced Text
type. We should change it definition to:
struct Text<'a> {
/// Untrimmed text after concatenating content of all
/// [`Text`] and [`CData`] events
text: Cow<'a, str>,
/// A range into `text` which contains data after trimming
content: Range<usize>,
}
Such a change will open a door to use a per-field control for trimming
Are you still looking at fixing this? If not, what remains to be done? This is a breaking issue for my team, and we may be interested in contributing in order to help fix it.
I did not put my efforts in this issue since my last comment. Because #572 was merged, we can move forward by the way outlined in that comment. We also can add a way to globally disable trimming, but I think such setting will have a limited usefulness. If you wish feel free to explore those opportunities.