combine
combine copied to clipboard
Native/abstracted sub-parsers
I'm in a similar situation to #199:
many1(block_expr_node).parse("*hi*")
fn block_expr_node<Input>() -> FnOpaque<Input, BlockExprNode>
where
Input: Stream<Token = char>,
Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
opaque!(no_partial(
choice!(bold(), char()).message("while parsing block_expr_node")
))
}
fn char<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
Input: Stream<Token = char>,
Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
satisfy(|c: char| !c.is_control())
.map(|c| BlockExprNode::Char(c))
.message("while parsing char")
}
fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
Input: Stream<Token = char>,
Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
(token('*'), many1(block_expr_node()), token('*'))
.map(|(_, v, _)| BlockExprNode::Bold(v))
.message("while parsing bold")
}
I think this does not work because once it starts to parse in bold
, the many1(block_expr_node())
picks char
and the input is consumed until EOF:
Error: Parse error at line: 1, column: 4
Unexpected end of input
Expected `*`
while parsing bold
while parsing char
while parsing block_expr_node
Replacing the bold
implementation with:
fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
Input: Stream<Token = char>,
Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
(
token('*'),
take_until::<String, _, _>(token('*')).map(|s| {
// HACK ouch ouch ouch
many1(block_expr_node())
.easy_parse(position::Stream::new(&s[..]))
// this is the except on Result
.expect("In bold subparser")
.0
}),
token('*'),
)
.map(|(_, v, _)| BlockExprNode::Bold(v))
.message("while parsing bold")
}
..parses correctly but is obviously messy and handling errors correctly as in https://github.com/Marwes/combine/issues/199#issuecomment-426447198 only adds more boilerplate. Do you think it could be possible to add an abstraction above flat_map
so this could be done like:
fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
Input: Stream<Token = char>,
Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
(
token('*'),
take_until::<String, _, _>(token('*')).and_reparse_with(many1(block_expr_node()),
token('*'),
)
.map(|(_, v, _)| BlockExprNode::Bold(v))
.message("while parsing bold")
}
It'd still create the sub-parser in a flat_map
but would hide the scary types from me :P
Could you post what kind of input you are trying to parse and how the output is expected to look like? I am not sure I understand it entirely, best I can tell the syntax seem rather ambigous.
**hi** // Could be parsed with the 1st and 4th `*` and the 2nd and 3rd as the start/end of a block or it could be parsed with the 1st and 2nd and the 3rd and 4th in the same block
With combine being LL it will consume eagerly so the first case (1+4 and 2+3) is the one that is parsed.
Overall it's a mix of Org-mode and Markdown so 1+4 and 2+3 seem right, although they don't really have any special meaning and will just get optimized out later:
*hi* ; BlockExprNode::Bold(BlockExprNode::Text("hi"))
*/italics & bold/* ; BlockExprNode::Bold(BlockExprNode::Italics(BlockExprNode::Text("italics & bold")))
**this is kinda useless** ; BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless")))
If that is the case, why is bold
calling itself recursively? Why not just (token('*'), many1(char()), token('*'))
?
why is
bold
calling itself recursively?
Because eventually block_expr_node
's choice!
will have more options (like italics
, etc..) so I'm just preparing it for that.
Ideally I could make block_expr_node
's choice!
skip bold
in the second, nested call from bold
but this isn't the reason I opened the issue. What do you think about the flat_map
abstraction?
It wouldn't be an unreasonable addition, however **this is kinda useless**
would not be parsed as you expect with the and_reparse_with
parser you showed, it would be parsed as Bold(""), Text("this is kinda useless"), Bold("")
so I am not sure it is the solution you are looking for.
To parse it as BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless")))
you would effectively need infinite lookahead to figure out which *
are opening and closing a block which doesn't seem right.
Another solution may be to do something like (many1(choice(bold_char(), italics_char(), etc)).then(|prefix| text().skip(string(prefix.reverse())))
which would consume a "block prefix", then parse the text and finally check that the prefix appears in reverse order after.
Another solution may be to do something like [..]
That would probably work, but you still need to treat hello* world
as normal text (no matching bold_char
) so it might be a bit more tricky.. For now I think I will leave this edge case alone since I want to get the whole pipeline kinda-working instead of parsing perfectly right away :P
It wouldn't be an unreasonable addition
Should I have a go at making a PR then? It seems tricky and I am still scared of all the types so I would need some mentoring probably