Understanding empty tag behavior
First of all, I want to thank everyone involved in this project for the excellent work they've done. It's absurdly fast and fits great in my project.
I have a question about expected behavior for empty tags. I have some XML that looks like this:
...
<value></value>
<value></value>
...
<value></value>
...
That is being parsed by this code:
match (state, reader.read_event()?) {
(State::ResultsInnerValueInner, Event::Text(e)) => {
column.push(e.unescape_with(irods_unescapes)?.to_string());
State::ResultsInnerValue
}
}
When I later print this value out, it has the value "\n". Is this expected behavior? I think I've seen it a couple other times. I would have guessed that the output would be the empty &str.
I cannot say what the reason of this without the full code, but I believe that you've get the text between </value> and next <value>. You should check that your state management is correct.
It also would be good to use dbg!(state, reader.read_event()?) to see that you've match exactly.
Oh interesting. I suppose I assumed that Text events only occurred in the context of something like <tag>...</tag>, but debugging does seem to show that they're appearing in </tag><tag> contexts and I've just gotten lucky so far. Thanks for the lead.
Also, just consuming Event::Texts is error-prone. In XML all text events should be concatenated together with CDATA contents and you should drop any comments between them. The code that takes into account all the nuances is quite large, but unfortunately, there is no good API out of box in quick-xml for this (note self.drain_text(...)):
https://github.com/tafia/quick-xml/blob/e8ae02098b3f52ae0ec78979c1307fc1bccf6998/src/de/mod.rs#L2222-L2243
@Mingun presumably the Reader / RawReader distinction will also handle the concatenation of CDATA and Text?
Yes, I'll plan to merge text events in new Reader. I think the average user does not need as high a degree of control as access to each individual text event.