chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

MappedInput related lifetime issues

Open ruabmbua opened this issue 9 months ago • 10 comments

I am trying to write a utility function, which uses a lexer and a parser, and returns an AST from the function. Here is the function (may not be complete):

pub fn parse<'src>(input: &'src str) -> Option<Vec<(Statement<'src>, SimpleSpan)>> {
    let mut output = None;
    let (tokens, errs) = lexer().parse(input).into_output_errors();

    let parse_errs = if let Some(tokens) = &tokens {
        let mapped_input = tokens.map((input.len()..input.len()).into(), |(t, s)| (t, s));

        let (block, errs) = block_parser().parse(mapped_input).into_output_errors();

        output = block;

        errs
    } else {
        Vec::new()
    };

    let all_errs = errs
        .into_iter()
        .map(|e| e.map_token(|c| c.to_string()))
        .chain(
            parse_errs
                .into_iter()
                .map(|e| e.map_token(|tok| tok.to_string())),
        );

    for err in all_errs {
        Report::build(ReportKind::Error, ("file", err.span().into_range()))
            .with_message(err.to_string())
            .with_label(
                Label::new(("file", err.span().into_range()))
                    .with_message(err.reason().to_string())
                    .with_color(Color::Red),
            )
            .with_labels(err.contexts().map(|(label, span)| {
                Label::new(("file", span.into_range()))
                    .with_message(format!("while parsing this {}", label))
                    .with_color(Color::Yellow)
            }))
            .finish()
            .eprint(("file", Source::from(input)))
            .unwrap();
    }

    output.map(|(o, _)| o)
}

The input uses 'src lifetime, which properly works with one of the parsers. My problem now is. if I try to map tokens to input for the block parser. the lifetime which is captured is not the one of my function. but rather from the local tokens vector. Is the API design of Input::map maybe slightly too restrictive? Currently I see no way to use two parsers like this, and then return the AST capturing 'src lifetime from the utility function.

From what I understand what happens is:

The .map call on the token vector uses the impl Input for &'src [T] impl, meaning it captures the local lifetime of the token vector. Maybe we need a another impl where MappedInput can capture e.g. a Vec directly by move and not borrow it?

ruabmbua avatar Apr 15 '25 18:04 ruabmbua

Ah that was fast...

I abused https://docs.rs/chumsky/latest/chumsky/input/struct.Stream.html to make it work, maybe this is even an intended usecase? Hopefully this does not have any extra cost for the internal iterator -> cache mechanism. Maybe it still makes sense to have a direct Input impl for Vec?

ruabmbua avatar Apr 15 '25 19:04 ruabmbua

What is the type signature of block_parser? This looks suspiciously like the issue mentioned here.

zesterer avatar Apr 15 '25 19:04 zesterer

Here is the signature:

pub fn block_parser<'src, I>()
-> impl Parser<'src, I, (Vec<(Statement<'src>, SimpleSpan)>, SimpleSpan), MyExtra<'src>> + Clone
where
    I: ValueInput<'src, Token = Token<'src>, Span = SimpleSpan>,
{

ruabmbua avatar Apr 16 '25 06:04 ruabmbua

Looking at the other thread, it seems to me splitting into two lifetimes for token and ast should not be necessary. Like I said above, I was able to abuse Stream to produce a Input that captures the lifetime from the first parser output correctly. The 'src lifetime in my code is alive longer than the parse function, and the parser objects themselves are essentially temporaries, I am not trying to return them, but rather just the input again (from a lifetime view).

I will paste my code when I am back at my Pc, forgot to commit.

ruabmbua avatar Apr 16 '25 06:04 ruabmbua

Here is the signature:

pub fn block_parser<'src, I>() -> impl Parser<'src, I, (Vec<(Statement<'src>, SimpleSpan)>, SimpleSpan), MyExtra<'src>> + Clone where I: ValueInput<'src, Token = Token<'src>, Span = SimpleSpan>, {

I think you have the same problem. You're conflating two different 'src lifetimes:

  1. The 'outer' lifetime of the original source code (the lifetime of the str slice passed into parse)

  2. The inner local lifetime of the token slice, created on the second line of parse's body

You'll need to change block_parser to separate them out. To make it a bit clearer, you could give them distinct names:

pub fn block_parser<'src, 'tokens, I>()
-> impl Parser<'tokens, I, (Vec<(Statement<'src>, SimpleSpan)>, SimpleSpan), MyExtra<'tokens>> + Clone
where
    I: ValueInput<'tokens, Token = Token<'src>, Span = SimpleSpan>,
{

Now, the mechanical aspects of the parser are tied to 'tokens, but the input token and output statement types are tied to 'src.

zesterer avatar Apr 16 '25 11:04 zesterer

Unfortunately this does not seem to work. I can not even create a simple nop parser (with todo()) that accepts a signature like this.

Idk maybe I misunderstand something, but to me it kinda seems like the lifetimes are tied together and its not really possible to mix them.

I kinda can make a simple parser compile like this:

pub fn simple_test_parser<'src: 'tokens, 'tokens: 'src, I>()
-> impl Parser<'tokens, I, (Vec<(Statement<'src>, SimpleSpan)>, SimpleSpan), MyExtra<'tokens>> + Clone
where
    I: ValueInput<'tokens, Token = Token<'src>, Span = SimpleSpan>,
{
    any().map_with(|t, extra| (Vec::new(), extra.span()))
}

Note where I tie the src lifetime to tokens and the other way around. Seems kinda pointless.

ruabmbua avatar Apr 16 '25 19:04 ruabmbua

Here is my example that works:

pub fn parse<'src>(input: &'src str) -> Option<Vec<(Statement<'src>, SimpleSpan)>> {
    let mut output = None;
    let (tokens, errs) = lexer().parse(input).into_output_errors();

    let parse_errs = if let Some(tokens) = tokens {
        let iter = tokens.into_iter();
        let stream = Stream::from_iter(iter);
        let mapped_input = stream.map((input.len()..input.len()).into(), |(t, s)| (t, s));

        let (block, errs) = block_parser().parse(mapped_input).into_output_errors();

        output = block;

        errs
    } else {
        Vec::new()
    };

    let all_errs = errs
        .into_iter()
        .map(|e| e.map_token(|c| c.to_string()))
        .chain(
            parse_errs
                .into_iter()
                .map(|e| e.map_token(|tok| tok.to_string())),
        );

    for err in all_errs {
        Report::build(ReportKind::Error, ("file", err.span().into_range()))
            .with_message(err.to_string())
            .with_label(
                Label::new(("file", err.span().into_range()))
                    .with_message(err.reason().to_string())
                    .with_color(Color::Red),
            )
            .with_labels(err.contexts().map(|(label, span)| {
                Label::new(("file", span.into_range()))
                    .with_message(format!("while parsing this {}", label))
                    .with_color(Color::Yellow)
            }))
            .finish()
            .eprint(("file", Source::from(input)))
            .unwrap();

        // Any error will currently delete the result
        output = None;
    }

    output.map(|(o, _)| o)
}`

No changes were necessary to any of my fn() -> impl Parser functions.

The trick is, that stream.map((input.len()..input.len()).into(), |(t, s)| (t, s)); creates a MappedInput not capturing any references to local variables (the token vec), but it rather contains it via value because of the .into_iter() + Stream::from_iter().

ruabmbua avatar Apr 16 '25 19:04 ruabmbua

Hmm, I'm quite confused as to what's going on here. Is there any chance that your parser is public and I'd be able to give compiling it a go myself?

zesterer avatar Apr 16 '25 19:04 zesterer

Its not released as open source, but I have no issue sharing. I extracted it into a single file. Only needs ariadne and chumsky with the pratt feature: https://gist.github.com/ruabmbua/c6ce1cf524f31f42408123a9873f8938

ruabmbua avatar Apr 16 '25 19:04 ruabmbua

The parser is actually a port from one previously written with pest, but I like how chumsky works and wanted to port to it for better error recovery I want to add in the future.

ruabmbua avatar Apr 16 '25 20:04 ruabmbua