combine icon indicating copy to clipboard operation
combine copied to clipboard

Native/abstracted sub-parsers

Open ckiee opened this issue 2 years ago • 6 comments

I'm in a similar situation to #199:

many1(block_expr_node).parse("*hi*")

fn block_expr_node<Input>() -> FnOpaque<Input, BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    opaque!(no_partial(
        choice!(bold(), char()).message("while parsing block_expr_node")
    ))
}


fn char<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    satisfy(|c: char| !c.is_control())
        .map(|c| BlockExprNode::Char(c))
        .message("while parsing char")
}

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (token('*'), many1(block_expr_node()), token('*'))
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

I think this does not work because once it starts to parse in bold, the many1(block_expr_node()) picks char and the input is consumed until EOF:

Error: Parse error at line: 1, column: 4
Unexpected end of input
Expected `*`
while parsing bold
while parsing char
while parsing block_expr_node

Replacing the bold implementation with:

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (
        token('*'),
        take_until::<String, _, _>(token('*')).map(|s| {
            // HACK ouch ouch ouch
            many1(block_expr_node())
                .easy_parse(position::Stream::new(&s[..]))
                // this is the except on Result
                .expect("In bold subparser")
                .0
        }),
        token('*'),
    )
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

..parses correctly but is obviously messy and handling errors correctly as in https://github.com/Marwes/combine/issues/199#issuecomment-426447198 only adds more boilerplate. Do you think it could be possible to add an abstraction above flat_map so this could be done like:

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (
        token('*'),
        take_until::<String, _, _>(token('*')).and_reparse_with(many1(block_expr_node()),
        token('*'),
    )
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

It'd still create the sub-parser in a flat_map but would hide the scary types from me :P

ckiee avatar Feb 21 '22 18:02 ckiee

Could you post what kind of input you are trying to parse and how the output is expected to look like? I am not sure I understand it entirely, best I can tell the syntax seem rather ambigous.

**hi** // Could be parsed with the 1st and 4th `*` and the 2nd and 3rd as the start/end of a block or it could be parsed with the 1st and 2nd and the 3rd and 4th in the same block

With combine being LL it will consume eagerly so the first case (1+4 and 2+3) is the one that is parsed.

Marwes avatar Feb 22 '22 09:02 Marwes

Overall it's a mix of Org-mode and Markdown so 1+4 and 2+3 seem right, although they don't really have any special meaning and will just get optimized out later:

*hi*                         ; BlockExprNode::Bold(BlockExprNode::Text("hi"))
*/italics & bold/*           ; BlockExprNode::Bold(BlockExprNode::Italics(BlockExprNode::Text("italics & bold")))
**this is kinda useless**    ; BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless")))

ckiee avatar Feb 22 '22 09:02 ckiee

If that is the case, why is bold calling itself recursively? Why not just (token('*'), many1(char()), token('*')) ?

Marwes avatar Feb 22 '22 10:02 Marwes

why is bold calling itself recursively?

Because eventually block_expr_node's choice! will have more options (like italics, etc..) so I'm just preparing it for that.

Ideally I could make block_expr_node's choice! skip bold in the second, nested call from bold but this isn't the reason I opened the issue. What do you think about the flat_map abstraction?

ckiee avatar Feb 22 '22 11:02 ckiee

It wouldn't be an unreasonable addition, however **this is kinda useless** would not be parsed as you expect with the and_reparse_with parser you showed, it would be parsed as Bold(""), Text("this is kinda useless"), Bold("") so I am not sure it is the solution you are looking for.

To parse it as BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless"))) you would effectively need infinite lookahead to figure out which * are opening and closing a block which doesn't seem right.

Another solution may be to do something like (many1(choice(bold_char(), italics_char(), etc)).then(|prefix| text().skip(string(prefix.reverse()))) which would consume a "block prefix", then parse the text and finally check that the prefix appears in reverse order after.

Marwes avatar Feb 22 '22 14:02 Marwes

Another solution may be to do something like [..]

That would probably work, but you still need to treat hello* world as normal text (no matching bold_char) so it might be a bit more tricky.. For now I think I will leave this edge case alone since I want to get the whole pipeline kinda-working instead of parsing perfectly right away :P

It wouldn't be an unreasonable addition

Should I have a go at making a PR then? It seems tricky and I am still scared of all the types so I would need some mentoring probably

ckiee avatar Feb 22 '22 14:02 ckiee