instaparse
instaparse copied to clipboard
Improve error message when CFG is malformed due to a missing closing quote or paren.
Let me first thank you for such a great thing you've published with Instaparse
. I've been wanting to learn (E/A)BNF and language parsing for a while but I found it hard to get going with it: instaparse is a breath of fresh air - just awesome, thanks for making this possible for me (and others :)
Now on topic: Is it perhaps some case such as I've got illustrated in the screenshot? I've tried simplifying the expression under the assumption it was either with 'eps' or without, '|' | '¦'
are referenced as the location but fairly obvious it is somewhere else:
Also it's become apparent that the line numbers reference the CFG string line numbers, and not those in the file, but only because I did check that and I'm working inline... should this be a file with a parser in it, it may not be so obvious - not that I suggest it should be changed but something worth considering and/or mentioning I guess.
So anyway, to show (as you'll know) that the referenced line/column position is not where the fault seems to be and that the eps
is not truely required just conventional (the message says on multiple occasions that it expects eps
-like symbols. Perhaps this could be clarified a bit as well?
Furthermore, but slightly off-topic, I don't really get why the equal =
sign as a token needs to be escaped.. but ok.
edit: I now notice something off with the quotes on the working example a = "|" | "¦'
notice the closing quote single and the rest double? Could this be something like a bug or is this expected?
So the most common cause for this sort of error message (an error at the very end of the parser description) is that a quote is either added or missing somewhere along the line, so everything is off and it gets to the end and can't make sense out of it.
For example,
S = 'a'' | 'b'
or
S = 'a | 'b'
are problematic (or similarly with parens).
Ideally, I'd like to provide a better error message for these sorts of cases, because knowing that it failed at the very end doesn't really help all that much to track it down.
So double-check that all your quotes are well-matched. If you still have a problem, post here your full grammar. I see from the picture that you are using a lot of unusual punctuation marks, maybe one of them is throwing off the parser in a way that I haven't accounted for, so I can investigate that.
I'm not sure what you mean about the = sign needing to be escaped.
The way your vertical parser prints does look unexpected. If there is a bug going on there, I doubt it is a bug with the underlying way the parser works, simply with how it prints. I can look into that as well.
I am unable to replicate the printing behavior you saw with your vertical parser printing as a = "|" | "¦'
Perhaps this is a problem with LightTable?
Were you able to spot any unmatched quotation marks in your parser definition which would explain the error you were getting?
I got something similar with this:
(def parse-uri
(insta/parser
"url = [www] domain [path]
www = 'www.'
domain = #'[^.]+\\.[^.]+'
path = backslash (token [backslash])*
backslash = '\\'
token = 'something'"))
and it seems to come from the inclusion of the backslash
rule, because if I replace with with just regular letters, it compiles fine. Could it be getting confused and thinking it's an escaped quote?
The general rule of thumb is that if you are expressing your grammar in a Clojure string (as opposed to storing it in a separate file) you have to double every backslash because Clojure's reader collapses ever pair of backslashes down to one.
In a regular Clojure string, a backslash would be "\\"
.
Therefore, in a parser specification, you need to double each of those backslashes, so your rule should be:
backslash = '\\\\'
'\\'
would work if you were storing it in a file, and slurping it in.
Certainly, though, this is the kind of thing that would be helped by a better error message, so I hope to add that at some point.
Ah, of course. Looking at the OP's code, it seems like that was his problem too.