quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Understanding empty tag behavior

Open phdavis1027 opened this issue 1 year ago • 5 comments

First of all, I want to thank everyone involved in this project for the excellent work they've done. It's absurdly fast and fits great in my project.

I have a question about expected behavior for empty tags. I have some XML that looks like this:

...
<value></value>
<value></value>
...
<value></value>
...

That is being parsed by this code:

match (state, reader.read_event()?) {
 (State::ResultsInnerValueInner, Event::Text(e)) => {
         column.push(e.unescape_with(irods_unescapes)?.to_string());
         State::ResultsInnerValue
   }
}

When I later print this value out, it has the value "\n". Is this expected behavior? I think I've seen it a couple other times. I would have guessed that the output would be the empty &str.

phdavis1027 avatar Apr 26 '24 01:04 phdavis1027

I cannot say what the reason of this without the full code, but I believe that you've get the text between </value> and next <value>. You should check that your state management is correct.

It also would be good to use dbg!(state, reader.read_event()?) to see that you've match exactly.

Mingun avatar Apr 26 '24 04:04 Mingun

Oh interesting. I suppose I assumed that Text events only occurred in the context of something like <tag>...</tag>, but debugging does seem to show that they're appearing in </tag><tag> contexts and I've just gotten lucky so far. Thanks for the lead.

phdavis1027 avatar Apr 26 '24 14:04 phdavis1027

Also, just consuming Event::Texts is error-prone. In XML all text events should be concatenated together with CDATA contents and you should drop any comments between them. The code that takes into account all the nuances is quite large, but unfortunately, there is no good API out of box in quick-xml for this (note self.drain_text(...)): https://github.com/tafia/quick-xml/blob/e8ae02098b3f52ae0ec78979c1307fc1bccf6998/src/de/mod.rs#L2222-L2243

Mingun avatar Apr 26 '24 16:04 Mingun

@Mingun presumably the Reader / RawReader distinction will also handle the concatenation of CDATA and Text?

dralley avatar Jul 01 '24 17:07 dralley

Yes, I'll plan to merge text events in new Reader. I think the average user does not need as high a degree of control as access to each individual text event.

Mingun avatar Jul 01 '24 17:07 Mingun