systemdUnitFilePlugin
systemdUnitFilePlugin copied to clipboard
Ensure Grammar Matching is Complete
The current implementation of our grammar using parser combinators, is not complete, because it's is greedy.
For instance the following:
Seq(ZeroOrMore("a"),"a")
And the string "aa" . The Zero or More will gobble the both as, the second part of Seq won't match and we will be done.
My recollection from Mastering Regular Expressions is that the two different Regex engines DFA and NFA solve this in different ways, a DFA based approach (and this might be backwards), builds every possible match in memory and keeps track of it as it goes through. An NFA approach involves backtracking.
I think we just want something ... simple, but I'm unclear exactly how to build this with Parser Combinators.
One ... gross idea I had is I could maybe map each combinator to a Regex fragment, then build a giant regex. This didn't seem like it would 'teach' me anything, and also at one point I thought maybe there were some combinators that couldn't be Regexs
It sees like I might need to build either backtracking or "building up everything into memory" manually.
I suspect that #343 would benefit from "building up everything into memory" approach.
Some random thoughts I've had...
I thought for a while that I could keep track of an integer and ask each thing to increment matches (with back tracking), but that might be hard for instance Seq(ZeroOrMore("a"),ZeroOrMore("a")) would require going through all of the matches of the first one before getting to the second one every time you backtrack.
Let's say we return an array of match results instead of just one. I think that there are a few things to note:
- I suspect RegexTerminal might need to die, and I'm fine with that.
LiteralChoiceTerminalandFlexibleLiteralChocieTerminalshould probably change to subtypes of AlternativeCombinator.
At that point we only need to worry about the following:
AlternativeCombinator- This should easily be able to check all matches. You simply get all matches from each alternative.OneOrMore- This is also Easy, again.ZeroOrMore- This is also Easy, again,ZeroOrOne- This is also easy, againEOF- This is also easy, and a NO-OP.Seq- This is maybe more annoying and could result in an exponential blow up. For instanceSeq(ZeroOrMore("a"),ZeroOrMore("a"),ZeroOrMore("a"))and the stringaaaaaa. The first block returns{,a,aa,aaa,aaaa,aaaaa,aaaaaa}after those 7 matches there is an upper bound of 49 matches for the second, and 777=343 for the third. (Note: For others or future self, the bounds are very lose, I think there are actually just 21 matches for the second set, because for instance the match "", has 7 possible matches for the second term, but the match "aaaaaa" has only one match possible for the second "". I just didn't want to think about how to calculate the third, and this is still O(n^2) after two matches.