quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Disable `trim_text` in Deserializer from_reader

Open woodworker opened this issue 3 years ago • 14 comments

Is there a easy way to set trim_text to false in the Deserializer::from_str when i use quick_xml::de::from_str?

https://github.com/tafia/quick-xml/blob/a4be48431a2c512683538b81d7ee037af70d5cce/src/de/mod.rs#L160-L167

woodworker avatar Apr 17 '21 20:04 woodworker

Is there a way to determine whether set trim_text by looking if there is xml:space = "preserve"? For example:

<t xml:space="preserve">Text </t>

The trailing space here should not be trimmed.

ImJeremyHe avatar May 06 '21 09:05 ImJeremyHe

There are very little customization on the serde deserializer so far. I don't think there is any major blocking point, someone just needs to write it.

tafia avatar May 12 '21 11:05 tafia

In the coming release Deserializer::new would be public and you could create a deserializer from a Reader (but do not turn off expand_empty_elements! For now Deserializer is not prepared for that).

Processing of xml:space still waits its own PR

Mingun avatar May 21 '22 20:05 Mingun

In the coming release Deserializer::new would be public

Was this implemented already? Deserializer::new is public in the latest version, but it seems useless since XmlRead can't be implemented outside of quick_xml and SliceReader and IoReader can't be instantiated also. There's no way to use Deserializer::new, unless I'm missing something here.

naumazeredo avatar Jul 28 '22 05:07 naumazeredo

Yes, this is oversight. So, currently this is still not possible, even in master. Need to think about better API. I also would to provide an API to create a deserializer for a part of XML, so you can mix usual Reader usage with the Deserializer usage, for example, to support streaming deserialization.

Mingun avatar Jul 28 '22 05:07 Mingun

According to the original use case -- I do not think that simply disabling trim_text would be usable -- it seems that you'll just break deserialization of pretty-printed XMLs at all with that setting

Mingun avatar Jul 28 '22 05:07 Mingun

Yeah, that's exactly what happened. I didn't get why that option, even though internal, exists. I've sadly moved back to serde_xml_rs since they give an option to not trim and I'm not willing to spend more time debugging xml deserialization right now. I'll be trying quick_xml in the future in case it gets more versatility

naumazeredo avatar Jul 29 '22 05:07 naumazeredo

The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements. It should be possible (and probably the default) to ignore the latter without affecting the text contents of elements themselves.

Having an option for trimming spaces around text contents is nice of course, but not at all necessary (the user could easily do this themselves) and as this issue points out it is more difficult to do "correctly" than originally envisioned. Maybe we should eliminate this feature and just keep the "ignore spaces between XML elements" functionality?

dralley avatar Jul 29 '22 13:07 dralley

The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements.

Yes, I think, we should move in that direction. A couple of thoughts:

  • need to take into account a #383 problem. I think, it can be solved by introducing a method to read all content as a string regardless of the XML markup inside:
    impl Reader {
      /// For XML
      ///
      /// <outer>  <inner/>   </outer>
      ///
      /// - can be called after BytesStart("outer")
      /// - returns "  <inner/>   "
      /// - consumes BytesEnd("outer")
      fn read_as_text(&mut self, end: QName) -> Result<Cow<str>> { ... }
    }
    // or maybe better (except for a long name :( )
    impl BytesStart {
      fn read_to_end_as_text(&self, reader: &mut Reader) -> Result<Cow<str>> { ... }
    }
    
  • we need a lookahead to decide if whitespace is significant or not (==determine the shape of the next tag -- opening/closing/self-closing/comment/PI/CData?) -- #414 is related. Should we, for example, allow XML comments in whitespace-significant parts (<outer> <!----> </outer>)? If yes, that would require a potentially infinity lookahead

Mingun avatar Jul 29 '22 17:07 Mingun

Hello, little update here, since Deserializer::new still is not public we can't have a from_str or any other variation where it does not trim the content. So I can work on a quick PR for that. But you already discussed about that, so have you a preferred solution ? I've thought about juste adding a from_reader implementation that takes a Reader (not something that implements IoReader) so that you can change the reader's attributes without having access to the Deserializer. This function will just force the attribute expand_empty_elements since you said it is required for the moment.

Pastequee avatar Dec 29 '22 10:12 Pastequee

Just disabling trim_text will not work correctly for pretty-printed XMLs, therefore I doubt that so limited implementation would be useful in mass. Actually, the trim_text* options should not exist at that level of parsing -- it is just wrong place to do trim. I'm working on proper trim implementation in #520 and I plan to implement it in 0.28 which is probably would be released 2-3 months later. After that probably this problem will gone (but maybe not).

Well, I think, that we could add a Deserializer::trim_text(trim: bool) method as a temporary solution, with a proper warning that it will not work for XMLs with pretty-printed parts.

Mingun avatar Dec 29 '22 15:12 Mingun

I've just created a #572. When it would be merged, we could change the content of an introduced Text type. We should change it definition to:

struct Text<'a> {
    /// Untrimmed text after concatenating content of all
    /// [`Text`] and [`CData`] events
    text: Cow<'a, str>,
    /// A range into `text` which contains data after trimming
    content: Range<usize>,
}

Such a change will open a door to use a per-field control for trimming

Mingun avatar Mar 04 '23 18:03 Mingun

Are you still looking at fixing this? If not, what remains to be done? This is a breaking issue for my team, and we may be interested in contributing in order to help fix it.

nsunderland1 avatar Nov 08 '23 01:11 nsunderland1

I did not put my efforts in this issue since my last comment. Because #572 was merged, we can move forward by the way outlined in that comment. We also can add a way to globally disable trimming, but I think such setting will have a limited usefulness. If you wish feel free to explore those opportunities.

Mingun avatar Nov 08 '23 16:11 Mingun