nom icon indicating copy to clipboard operation
nom copied to clipboard

Updated the pull request from #469 so that it will cleanly apply against master.

Open chotchki opened this issue 4 years ago • 10 comments

Cleaned up the conflicts from #469 so that the parser can be merged cleanly.

chotchki avatar Oct 24 '21 16:10 chotchki

Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.

This lead to very ineffective parser.

Plus, it's unfortunate you remove the original author.

Stargateur avatar Oct 24 '21 21:10 Stargateur

@Stargateur The issue that caused me to switch to using this particular combinator is the definition of sql string literals in Postgres found here: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS (Section 4.1.2.1).

It requires a single ', any combination of unicode characters and a terminating single quote '. Multiple sql strings are merged into one IF and ONLY IF they are joined by whitespace with a newline.

Valid example: 'foo' 'bar'

Invalid example: 'foo' 'bar'

This combinator is fairly greedy and seemed the best way to look ahead and match as much of the whole string as possible. My current code is here (other attempts have been commented out in the link): https://github.com/chotchki/feophant/blob/main/src/engine/sql_parser/constants.rs#L29

Further work is coming to support the variety of embedded escape sequences.


I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!

chotchki avatar Oct 25 '21 01:10 chotchki

Seem such syntax is parser very well with separated_list1 as separated_list1(new_line, constant_sql_string)(s). constant_sql_string as delimited(tag('\''), accepted_char, tag('\''))

I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!

I didn't want to blame in anyway, I also struggle with git it's hard to use.

I think you can do something like that:

git checkout take_until_parser_matches
git branch -m take_until_parser_matches_tmp
git remote add tomalexander https://github.com/tomalexander/nom.git
git fetch tomalexander
git checkout --track tomalexander/take_until_parser_matches
git cherry-pick 706c89355ae56d57b8e181b1770d0dc2fab7fc70
# resolve conflict
git push --force

Best I can propose. But don't worry that much, that just unfortunate.

Stargateur avatar Oct 25 '21 01:10 Stargateur

See https://github.com/Geal/nom/pull/1444 that allow empty sep if you want separated_list1(opt(new_line), constant_sql_string)(s)

Stargateur avatar Oct 25 '21 04:10 Stargateur

Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.

I was parsing a language that relies on keywords, and allows multiple words as names in between, like <verb> <name> [modifier] e.g. <Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that example for 30 minutes and with vigor) succeeds to parse. It's not efficient... the suggestion of starting the parse at the end sounds nice, but it would complicate the logic a bit (not all examples add with a list of modifiers like this). I feel this is a nice addition to nom because it is intuitive (for me anyway ;-)), albeit not the most efficient way to parse such grammars.

NickNick avatar Oct 26 '21 18:10 NickNick

I was parsing a language that relies on keywords, and allows multiple words as names in between, like <verb> <name> [modifier] e.g. <Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that example for 30 minutes and with vigor) succeeds to parse.

I don't get why you simply not parse <verb> then <name> then the modifiers. Since all of them are correctly delimited I don't see the problem here. I can produce an example if you wish with the following requirement:

let input = "<Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]";
let result = parse(input);
assert_eq!(result, Ok({
  verb: "Rub",
  names: vec!["the lively kitten's belly"],
  modifiers: vec!["for 30 minutes", "with vigor"],
}));

Thus like you say it's working for you, but I still don't see why "official" nom should have this. Tell me If I miss something, if needed be more precise I will try to help solve the problem.

In my opinion as user of nom, a new parser should solve a problem that can't be solve using other conbinator (or introduce a shortcut like for example many0). Add a parser that simply brute force every bytes of the input doesn't meet nom philosophy. Nom eat byte by byte. I agree there is already a number of parser like this in nom but I don't think there are a good addition to nom.

Stargateur avatar Oct 26 '21 19:10 Stargateur

The brackets were for clarity, the actual input is just Rub the lovely kitten's belly for 30 minutes with vigor. I am sure you can come up with a way to parse this specific example without this combinator though, but I don't get why you think this is such an unusual combinator while there is take_until. This is just a fancier version of it for when the thing you want to take_until is not a fixed value. That is not so weird right?

NickNick avatar Oct 27 '21 18:10 NickNick

@Stargateur I like the approach, working on it.

chotchki avatar Oct 31 '21 13:10 chotchki

@Stargateur This is pushing my understanding of nom but I'm trying to understand the signature fold_until0 completely.

I have the following signature with the where clause:

pub fn fold_until0<F, G, H, A, Input, Output, Error>(
  f: F,
  g: G,
  init: H,
  acc: A,
) -> impl FnMut(Input) -> IResult<Input, (A, Output), Error>
where
  Input: InputTake + InputIter + InputLength + Clone,
  F: Parser<Input, Output, Error>,
  G: FnMut(A, Output) -> (A, Output),
  H: FnMut() -> (A, Output),
  Error: ParseError<Input>,
  • "f" is the child parser that will recognize the end
  • "g": I don't understand in the context of the other arguments
  • "init": Function for the initial state for the accumulator
  • "acc": Function for building the accumulator up. I don't know if this really should be "g"
  • I'm returning the remaining input, the accumulation, the content until "f" matched (dropping "f"'s output).

Can you please help me understand if BOTH "g" and "acc" make sense? I'm reading https://docs.rs/nom/7.0.0/nom/multi/fn.fold_many0.html and it seems to just use g to accumulate.

chotchki avatar Nov 03 '21 03:11 chotchki

pub fn fold_until0<P, Until, Init, Acc, Fold, Input, Output, UntilOutput, Error>(
    parser: P,
    until: Until,
    init: Init,
    fold: Fold,
) -> impl FnMut(Input) -> IResult<Input, (Acc, UntilOutput), Error>
where
    Input: InputTake + InputIter + InputLength + Clone,
    F: Parser<Input, Output, Error>,
    Until: Parser<Input, UntilOutput, Error>,
    Fold: FnMut(Acc, Output) -> Acc,
    Init: FnMut() -> Acc,
    Error: ParseError<Input>;
  • parser is used to parse what is before could be anychar for quick and dirty parsing
  • until is the parser used to stop the output of this parser is returned by fold_until
  • init is the init fct used to get the accumulator of the user
  • fold is the user fct that use the accumulator and return it

the body of fold_until0 is very simulator to https://github.com/Geal/nom/pull/1341

Stargateur avatar Nov 03 '21 08:11 Stargateur