pom icon indicating copy to clipboard operation
pom copied to clipboard

Support back-references with ">>"

Open jameskirkwood opened this issue 2 years ago • 3 comments

Because seq produces a Parser that continues to borrow its tag, it's not possible to use the overloaded right shift operator (>>) with seq to create a back-reference to a previously parsed fragment.

As a basic example, the following will not compile because tag does not live long enough:

fn example() -> Parser<u8, Vec<u8>> {
    (sym(b'<') * none_of(b">").repeat(0..) - sym(b'>')) >> |tag| {
        (call(example) | none_of(b"<>").repeat(0..)) - seq(b"</") - seq(&tag) - sym(b'>')
    }
}

One solution is to modify seq so that it makes an internal copy of tag to be moved into the closure it generates. I tried this but I wasn't quite successful as I also changed the return type to Parser<'a, I, Vec<I>> and introduced a copy every time the sequence matched (only for the result to be immediately discarded).

Perhaps there is a way for seq to support both borrowing and owning its tag, or perhaps there is a good case for a new parser factory that matches against an owned tag?

Suggestions for alternatives are welcome.

jameskirkwood avatar Feb 02 '22 14:02 jameskirkwood

A workaround is:

fn example<'a>() -> Parser<'a, u8, Vec<u8>> {
    (sym(b'<') * none_of(b">").repeat(0..) - sym(b'>'))
        >> |tag| {
            (call(example) | none_of(b"<>").repeat(0..))
                - seq(b"</") - take(tag.len()).convert(move |t| if t == tag { Ok(()) } else { Err(()) })
                - sym(b'>')
        }
}

You may else define a new owned version of seq.

J-F-Liu avatar Feb 02 '22 14:02 J-F-Liu

I prefer your workaround as I don't need to use Parser::new, but for the record here is an owned version of seq:

fn seq_owned<'a, I>(tag: Vec<I>) -> Parser<'a, I, Vec<I>>
where
    I: PartialEq + Debug + Clone,
{
    Parser::new(move |input: &[I], start: usize| {
        let mut index = 0;
        loop {
            let pos = start + index;
            if index == tag.len() {
                return Ok((tag.to_owned(), pos));
            }
            if let Some(s) = input.get(pos) {
                if tag[index] != *s {
                    return Err(Error::Mismatch {
                        message: format!("seq {:?} expect: {:?}, found: {:?}", tag, tag[index], s),
                        position: pos,
                    });
                }
            } else {
                return Err(Error::Incomplete);
            }
            index += 1;
        }
    })
}

jameskirkwood avatar Feb 05 '22 06:02 jameskirkwood

...And here is a much shorter owned version of seq that encapsulates your workaround, which could be a useful recipe:

fn seq_owned(tag: &[u8]) -> Parser<u8, ()> {
    let tag = tag.to_owned();
    take(tag.len()).convert(move |t| if t == tag { Ok(()) } else { Err(()) })
}

jameskirkwood avatar Feb 20 '22 00:02 jameskirkwood