quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Lifetime issue in new unbuffered logic

Open stevenroose opened this issue 4 years ago • 3 comments

I have been trying out the new alpha version and I think I stumbled upon a kinda-bug. Somehow something that should be possible is not, I think because of the way lifetimes are indicated in some methods. I'm trying to find a fix, but to reproduce the issue, I have the following snippet.

In the snippet, I'm trying to use a &'x [u8] and trying to deserialize into a struct that holds a Cow<'x, str> with the content of a tag. Somehow this snippet gives a lifetime error. I think the main culprit is the combination of do_unescape that equates the lifetime of the slice

pub fn do_unescape<'a>(
    raw: &'a [u8],
    custom_entities: Option<&HashMap<Vec<u8>, Vec<u8>>>,
) -> Result<Cow<'a, [u8]>, EscapeError> {

together with calling into do_escape as follows which actually overwrites the 'a lifetime with a new local lifetime: do_unescape(&*self.value, custom_entities).map_err(Error::EscapeError)

Here is the snippet:

use std::str;
use std::borrow::Cow;
use std::error::Error;

use quick_xml::Reader;
use quick_xml::events::Event;


pub struct Example<'a> {
	pub content: Cow<'a, str>,
}

/// Convert a UTF8-encoded [Cow<[u8]>] into a [Cow<str>].
fn cow_from_utf8<'a>(cow: Cow<'a, [u8]>) -> Result<Cow<'a, str>, str::Utf8Error> {
	Ok(match cow {
		Cow::Borrowed(v) => Cow::Borrowed(str::from_utf8(v)?),
		Cow::Owned(v) => Cow::Owned(String::from_utf8(v).map_err(|e| e.utf8_error())?),
	})
}

pub fn parse<'a>(xml: &'a impl AsRef<[u8]>) -> Result<Example<'a>, Box<dyn Error>> {
	let mut reader = Reader::from_reader(xml.as_ref());

	match reader.read_event_unbuffered()? {
		Event::Start(x) => match str::from_utf8(x.name())? {
			"Content" => {
				let content = match reader.read_event_unbuffered()? {
					Event::Text(t) => cow_from_utf8(t.unescaped()?)?,
					Event::End(e) if e.name() == x.name() => Cow::Borrowed(""),
					e => return Err(format!("unexpected event '{:?}'", e).into()),
				};
				match reader.read_event_unbuffered()? {
					Event::End(e) if e.name() == x.name() => {}
					e => return Err(format!("unexpected event '{:?}'", e).into()),
				}
				Ok(Example {
					content: content,
				})
			}
			t => Err(format!("unexpected tag '{}'", t).into()),
		}
		e => Err(format!("unexpected event '{:?}'", e).into()),
	}
}

stevenroose avatar Oct 25 '21 19:10 stevenroose

So I created a branch in which I added the lifetimes that should be added to the API in order to be correct. And it breaks borrowck atm, so I'm trying to fix them. Someone with more knowledge of the project might be have an easier time to figure it out, though.

Here are the lifetimes: https://github.com/stevenroose/quick-xml/commit/fb7a3dd0515368287d39998edaa0ba90e1cb99ef

stevenroose avatar Oct 25 '21 20:10 stevenroose

I'm thinking this might be a problem in Rust's Cow implementation or even some trait definitions. Both AsRef, Borrow and Deref return references with a lifetime equal to the lifetime of the borrow of self. So a &'a Cow<'b, [u8]> can seemingly not be made into a &'b [u8], only into &'a [u8].

stevenroose avatar Oct 25 '21 20:10 stevenroose

So I'm starting to also convert the decode functionality, because it suffers from the same problem. Though when doing multiple conversions that are free on &[u8] on a Cow<[u8]> seem to do a lot of subsequent re-allocations in the owned case. I'm thinking if we can somehow use the Bytes crate to do this more efficiently.

stevenroose avatar Oct 26 '21 08:10 stevenroose