quick-xml
quick-xml copied to clipboard
Add ability to deserialize serde types from `Reader`
When working with deeply nested xml, most of the time, we are only interested in a portion of the whole tree close to the leaf node. My idea is to extract the string of the target node and deserialize it with serde. But I can't find any convenient way to do that.
Currently I use read_text
to get the inner content of the node and add the start and end tag manually, but then the code looks really weird, especially when the node has many attributes. It would be great if there's a method (read_node
or something) to do that.
By the way, is there any reason why read_text
is not implemented for Reader<File>
?
Having a deserialize
method for Reader
that would be able to deserialize piece of XML into a type using serde from current position is definitely a feature I also want -- as a counterpart to #610. Implementation, however, not so simple, because serde deserializer requires some (potentially unbounded) lookahead, therefore we need to buffer events somewhere.
The possible API could look something like this:
impl<'a> Reader<&'a [u8]> {
fn deserialize<T>(&mut self, seed: Event<'a>) -> Result<T, DeError>
where
T: Deserialize<'a>,
{}
}
impl<R: Read> Reader<R> {
fn deserialize_into<'de, T>(&mut self, seed: Event<'de>, buffer: &'de mut Vec<u8>) -> Result<T, DeError>
where
T: Deserialize<'de>,
{}
}
The seed
here is an event that we got from Reader
in typical read cycle which likely will be a part of the type that we want to deserialize.
Another possible API (very schematic):
impl<R> Reader<R> {
fn deserializer(&mut self, seed: Event) -> FragmentDeserializer { ... }
}
struct FragmentDeserializer { ... }
impl FragmentDeserializer {
fn deserialize<T>(self) -> Result<T, DeError>
where
T: Deserialize<'a>,
{}
fn deserialize_into<'de, T>(self, buffer: &'de mut Vec<u8>) -> Result<T, DeError>
where
T: Deserialize<'de>,
{}
}
Another question, in what state we should leave Reader
if deserialization fails? Or how we should provide access to an events that was consumed during lookahead, but not used to deserialize the final type? What if we want to call deserialize
twice -- then we should to consider lookaheaded events from the first deserialize
call. Probably we need a more generic API:
impl<R> Reader<R> {
/// Convert to a reader that can store up to `count` events in the internal buffer
fn lookahead(self, count: usize) -> LookaheadReader<R> { ... }
}
impl<'de, 'a, R> Deserializer<'de> for &'a mut LookaheadReader<R> { ... }
By the way, is there any reason why
read_text
is not implemented forReader<File>
?
It is not trivial to do that, because we cannot just reuse read_to_end_into
method -- it stores into buffer only content of the tags, but skips markup characters (<
, >
and so on). The attempts to implement it tracked in #483.
I would also like this. Go makes it easy to mix pull based parsing with a state machine and deserializing structs:
decoder := xml.NewDecoder(r.Body)
decoder.Strict = true
for {
switch se := t.(type) {
case xml.StartElement:
level++
switch se.Name.Local {
case "fooTag":
var req schema.FooRequest
decoder.DecodeElement(&req, &se)
// do stuff
case "barRequest":
var req schema.BarRequest
err = decoder.DecodeElement(&req, &se)
// do stuff
}
case xml.EndElement:
level--
}
}
}
I could live with an implementation that ties the lifetime of the Reader and the deserialized object to the source lifetime, i.e. only applies to readers backed by a &str
.