Updated the pull request from #469 so that it will cleanly apply against master.
Cleaned up the conflicts from #469 so that the parser can be merged cleanly.
Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.
This lead to very ineffective parser.
Plus, it's unfortunate you remove the original author.
@Stargateur The issue that caused me to switch to using this particular combinator is the definition of sql string literals in Postgres found here: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS (Section 4.1.2.1).
It requires a single ', any combination of unicode characters and a terminating single quote '. Multiple sql strings are merged into one IF and ONLY IF they are joined by whitespace with a newline.
Valid example: 'foo' 'bar'
Invalid example: 'foo' 'bar'
This combinator is fairly greedy and seemed the best way to look ahead and match as much of the whole string as possible. My current code is here (other attempts have been commented out in the link): https://github.com/chotchki/feophant/blob/main/src/engine/sql_parser/constants.rs#L29
Further work is coming to support the variety of embedded escape sequences.
I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!
Seem such syntax is parser very well with separated_list1 as separated_list1(new_line, constant_sql_string)(s). constant_sql_string as delimited(tag('\''), accepted_char, tag('\''))
I only recreated the pull request and code because I couldn't get git rebase to replay the original author's commits (despite trying several times this week). From my googling I can't directly edit the original pull request so this was my attempted work around. I tagged the original pull request to link back and give credit, if I've made a mistake with that, my apologies!
I didn't want to blame in anyway, I also struggle with git it's hard to use.
I think you can do something like that:
git checkout take_until_parser_matches
git branch -m take_until_parser_matches_tmp
git remote add tomalexander https://github.com/tomalexander/nom.git
git fetch tomalexander
git checkout --track tomalexander/take_until_parser_matches
git cherry-pick 706c89355ae56d57b8e181b1770d0dc2fab7fc70
# resolve conflict
git push --force
Best I can propose. But don't worry that much, that just unfortunate.
See https://github.com/Geal/nom/pull/1444 that allow empty sep if you want separated_list1(opt(new_line), constant_sql_string)(s)
Can you explain more clearly why it's perfect for your use case (with example & grammar) ? For me this parser is totally opposite to the concept of combinator. This instead of eat input and give the rest to other combinator, eat the end, and let the head to other combinator.
I was parsing a language that relies on keywords, and allows multiple words as names in between, like <verb> <name> [modifier] e.g. <Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that example for 30 minutes and with vigor) succeeds to parse. It's not efficient... the suggestion of starting the parse at the end sounds nice, but it would complicate the logic a bit (not all examples add with a list of modifiers like this). I feel this is a nice addition to nom because it is intuitive (for me anyway ;-)), albeit not the most efficient way to parse such grammars.
I was parsing a language that relies on keywords, and allows multiple words as names in between, like
<verb> <name> [modifier]e.g.<Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]. To figure out where 'name' ends, I look in this specific case for when a modifier (in that examplefor 30 minutesandwith vigor) succeeds to parse.
I don't get why you simply not parse <verb> then <name> then the modifiers. Since all of them are correctly delimited I don't see the problem here. I can produce an example if you wish with the following requirement:
let input = "<Rub> <the lovely kitten's belly> [<for 30 minutes> <with vigor>]";
let result = parse(input);
assert_eq!(result, Ok({
verb: "Rub",
names: vec!["the lively kitten's belly"],
modifiers: vec!["for 30 minutes", "with vigor"],
}));
Thus like you say it's working for you, but I still don't see why "official" nom should have this. Tell me If I miss something, if needed be more precise I will try to help solve the problem.
In my opinion as user of nom, a new parser should solve a problem that can't be solve using other conbinator (or introduce a shortcut like for example many0). Add a parser that simply brute force every bytes of the input doesn't meet nom philosophy. Nom eat byte by byte. I agree there is already a number of parser like this in nom but I don't think there are a good addition to nom.
The brackets were for clarity, the actual input is just Rub the lovely kitten's belly for 30 minutes with vigor. I am sure you can come up with a way to parse this specific example without this combinator though, but I don't get why you think this is such an unusual combinator while there is take_until. This is just a fancier version of it for when the thing you want to take_until is not a fixed value. That is not so weird right?
@Stargateur I like the approach, working on it.
@Stargateur This is pushing my understanding of nom but I'm trying to understand the signature fold_until0 completely.
I have the following signature with the where clause:
pub fn fold_until0<F, G, H, A, Input, Output, Error>(
f: F,
g: G,
init: H,
acc: A,
) -> impl FnMut(Input) -> IResult<Input, (A, Output), Error>
where
Input: InputTake + InputIter + InputLength + Clone,
F: Parser<Input, Output, Error>,
G: FnMut(A, Output) -> (A, Output),
H: FnMut() -> (A, Output),
Error: ParseError<Input>,
- "f" is the child parser that will recognize the end
- "g": I don't understand in the context of the other arguments
- "init": Function for the initial state for the accumulator
- "acc": Function for building the accumulator up. I don't know if this really should be "g"
- I'm returning the remaining input, the accumulation, the content until "f" matched (dropping "f"'s output).
Can you please help me understand if BOTH "g" and "acc" make sense? I'm reading https://docs.rs/nom/7.0.0/nom/multi/fn.fold_many0.html and it seems to just use g to accumulate.
pub fn fold_until0<P, Until, Init, Acc, Fold, Input, Output, UntilOutput, Error>(
parser: P,
until: Until,
init: Init,
fold: Fold,
) -> impl FnMut(Input) -> IResult<Input, (Acc, UntilOutput), Error>
where
Input: InputTake + InputIter + InputLength + Clone,
F: Parser<Input, Output, Error>,
Until: Parser<Input, UntilOutput, Error>,
Fold: FnMut(Acc, Output) -> Acc,
Init: FnMut() -> Acc,
Error: ParseError<Input>;
-
parseris used to parse what is before could beanycharfor quick and dirty parsing -
untilis the parser used to stop the output of this parser is returned by fold_until -
initis the init fct used to get the accumulator of the user -
foldis the user fct that use the accumulator and return it
the body of fold_until0 is very simulator to https://github.com/Geal/nom/pull/1341