better-parse
better-parse copied to clipboard
How are delegates selected?
Im finding that the order I declare my delegates in a parser grammar affects whether or not it parses. I have a grammar like the following:
internal class Parser: Grammar<List<Command>>() {
internal val comments by regexToken("#.*\n", true)
internal val str by regexToken("\".*\"")
internal val queryType by regexToken("[A-Z]+(?:_[A-Z]+)*")
internal val word by regexToken("[A-Za-z]+")
internal val LPAR by literalToken("(")
internal val RPAR by literalToken(")")
internal val COLON by literalToken(":")
internal val LBRACE by literalToken("{")
internal val RBRACE by literalToken("}")
internal val equals by literalToken("=")
internal val ws by regexToken("\\s+",true)
internal val newline by regexToken("[\r\n]+",true)
internal val comma by literalToken(",")
internal val param: Parser<ValueMetadata> by (word and -COLON and word) map { (p, t) ->
ValueMetadata(p.text, Type.valueOf(t.text))
}
val params by -LPAR and separatedTerms(param, comma, true) and -RPAR
val outputs by -LPAR and separatedTerms(param, comma, true) and -RPAR
val cmdParser by ( -LBRACE and queryType and -equals and str and -RBRACE )
val funcParser: Parser<Command> by (word and params and -COLON and params and cmdParser) map {
(name, inputs, outputs, cmdFunc) ->
val (type,cmd) = cmdFunc
Command(name.text,
inputs,
outputs,
cmd.text.subSequence(1, cmd.length - 1).toString(),
QueryType.valueOf(type.text)
)
}
override val rootParser: Parser<List<Command>> by zeroOrMore(funcParser)
}
thats meant to parse
# Documentation that should be ignored
findFoo(test:String,entity:String):(foo:String,bar:Int) {
SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}
# Documentation that should be ignored
findBar(test:String,entity:String):(foo:String,bar:Int) {
SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}
into a list of Commands. By just switching the order of str, queryType, and word the parse will fail / pass on different test cases with errors like Could not parse input: UnparsedRemainder(startsWith=word@2 for "findFoo" at 39 (2:1))
The tokens you declare with delegation are matched in the same order as declared. So if the tokenizing is ambiguous (which is often the case) then the tokens declared earlier are prioritized.
Note also this section in the README:
Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. For instance, if
literalToken("a")
is listed beforeliteralToken("aa")
, the latter will never be matched. Be careful with keyword tokens! If you match them with regexes, a word boundary\b
in the end may help against ambiguity.