parsatron
parsatron copied to clipboard
Bug in `attempt`?
;; works as expected (run (attempt (p/char \space)) " ") ==> \space ;; expected nil (run (attempt (p/char \space)) "") ==> Unexpected end of input at line: 1 column: 1 [Thrown class java.lang.RuntimeException] ;; expected nil (run (attempt (p/char \space)) "x") ==> Unexpected token 'x' at line: 1 column: 1 [Thrown class java.lang.RuntimeException]
From the docstring “A parser that will attempt to parse p, and upon failure never consume any input” it was not clear to me, that attempt
would throw an Exception.
The docstring is technically right, but I agree that it's a bit jarring at first.
attempt
is for things like this:
(defparser hello []
(char \H)
(char \e)
(char \l)
(char \l)
(char \o)
(always :hello))
(defparser hi []
(char \H)
(char \i)
(always :hi))
(run (choice (hello)
(hi))
"Hello!")
; :hello
(run (choice (hello)
(hi))
"Hi!")
; RuntimeException...
(run (choice (attempt (hello))
(hi))
"Hi!")
; :hi
The first run works because it just parses Hello.
The second run fails because the first one consumes and parses the H just fine,
then errors out on the e. choice
moves to the next parser on the list, but
the H has already been consumed, so it tries to parse "i!" with hi
, which
fails.
Wrapping the hello parser in an attempt lets you ignore the fact that it consumed input.
Why does choice consume anything if it doesn’t match? I would have expected some kind of backtracking. In the second example it can match the H
for Hi!
, but then fails. So it backtracks and tries again with (hi)
, which is successful.
And I am still not sure why attempt
should throw exceptions upon some failures, while on others it doesn’t. In your example I would assume from my tests that (attempt (hello))
should throw an Exception, because it can’t parse the input.
Why does choice consume anything if it doesn’t match? I would have expected some kind of backtracking. In the second example it can match the H for Hi!, but then fails. So it backtracks and tries again with (hi), which is successful.
I'll leave this for @youngnh to answer since I'm still a bit shaky on this.
I am still not sure why attempt should throw exceptions upon some failures, while on others it doesn’t. In your example I would assume from my tests that (attempt (hello)) should throw an Exception, because it can’t parse the input.
Okay, first, attempt
never throws anything. The only place the
RuntimeException is thrown (aside from the special case of sanity-checking many
and zero-element parsers) is in run
:
(defn run
[p input]
(let [result (run-parser p ...)]
(condp instance? result
Ok (:item result)
Err (throw (RuntimeException. ^String (:errmsg result))))))
Basically, if the parser you give to run
results in an Ok, the result is
unpacked and returned normally. If it returns an error, the exception is
thrown.
So:
(def hello []
(string "hello")
(always :hello))
(def hi []
(string "hi")
(always :hi))
First case:
(run (hello) "hello")
; :hello
Matches fine, returns result. All good. Next:
(run (hello) "hi")
; RuntimeException
The (hello)
returned an error. run
threw a runtime exception.
(run (attempt (hello)) "hi")
; RuntimeException
The (attempt (hello))
returned an error. run
threw a runtime exception.
(run (choice (hello)
(hi))
"hi")
; RuntimeException
The (hello)
returned an error. It also consumed the "h" before it got to the
thing that caused the error. (hi)
tried to parse "i" and returned an error,
so (choice ...)
returned an error. run
threw a runtime exception. This is
the part I'm a bit shaky on and need @youngnh to explain further.
(run (choice (attempt (hello))
(hi))
"hi")
; :hi
The (attempt (hello))
returned an error, but did not consume any input.
(hi)
parsed the "hi" and returned Ok. run
unwraps the result and returns
it.
Okay good, thanks for the clarification. It was run
who threw the Exception.
So now it is clear why my example above resulted in one. The attempt
tried to parse it, but was unsuccessful. Now run
ran out of options and couldn’t parse my ""
or "x"
inputs.
What is the difference between these two?
(run (either (char \x) (char \y)) "y") (run (either (attempt (char \x)) (char \y)) "y")
So, attempt
is the “optional” operator. It tries to match if possible, but if it can’t match, then it consumes nothing and is effectively a null-operation. Is that correct?
Okay, after rewatching the talk to refresh my memory of the internals, I think it's a tiny bit different than I mentioned above.
(run (choice (hello)
(hi))
"hi")
; RuntimeException
In that example, hello effectively returned "error, and I consumed a value".
choice
then immediately returned a value without even trying (hi)
, becuase
it know that it has to try each option at the same spot, and now that it's moved
forward it can't do that any more.
Okay, now in your last example:
(run (either (char \x) (char \y)) "y")
This first tries the (char \x)
parser. It returns an error, but (importantly)
it did not consume any input. So choice
is still free to try the next option
at the correct location.
If you chained two characters together and the first one matched but the second one didn't, then it would have consumed input.
But because (char \whatever)
only tries to parse a single thing, on its own
it never consumes input unless it succeeds.
(run (either (attempt (char \x)) (char \y)) "y")
The attempt
here is really a no-op. It's not going to hurt anything (maybe
a tiny performance hit of course), but since it's wrapping something that can't
possibly consume input if it fails, it's not necessary.
Back to my example:
(defparser hello []
(char \H)
(char \e)
(char \l)
(char \l)
(char \o)
(always :hello))
Now defparser
wraps all of its body in a >>
for you. So this is exactly
equivalent (at least for this question):
(def hello
(>>
(char \H)
(char \e)
(char \l)
(char \l)
(char \o)
(always :hello)))
Because this is parsing more than one thing at a time, and consumes/parses each
bit before moving onto the next, and can fail anywhere along the way, it may end
up consuming input. That's why we need the attempt in a choice
.
Now, with all that said, I think it would be a lot more intuitive to use if
(choice foo bar)
wrapped its arguments in (attempt)
s for you. I don't see
any downside to that and it'd make things a lot easier to use.
Yes agreed, if it automatically wrapped its args, then it would give the backtracking effect.
Backtracking is a common request and common source of confusion. The Parsatron doesn't do it by default because Haskell's parsec doesn't do it and this lib started out as a straight port. As things currently stand, users have to manually implement backtracking using parsers like attempt
and lookahead
just as you worked out above.
There is something natural that matches people's expectations about backtracking. Its the way that new users of the library expect things to work, so there is an argument there that The Parsatron should implement it and people's programs will work as they want them to without muss or fuss.
At the same time backtracking can incur very high costs in the case of long parse paths that fail close to the end, in highly branched parsers, or in parsers that attempt a match multiple times (like many
). These problems are subtle and if your parser is slow, require knowledge of the implementation in order to diagnose.
Right now, I think I'm leaning towards creating a second namespace in The Parsatron with backtracking equivalents of the functions in the current namespace. So that users may switch between implementations simply by changing the :use
or :require
definitions at the top of their file.
Maybe this is the right way to approach this. A new NS that can be used for testing. People could use the more comfortable one and see how it works out, and in case something is not going fast enough, users with more insight can switch back to the more low-levelish implementation. But I guess there is still room for more efficiency anyway. And that may lead to some uglieness… As soon we have +/- copy pasted code, it may require changes at two places, if bugfixes or speed improvements are being applied. Right now though I would still prefer to have auto-attempting versions of either/choice.
I've just run into the backtracking problem myself. I think the issue is that it's natural to expect the combinators to behave opaquely; having one branch of a choice fail but still consume input is very strange.
One option is to add something like choice* from #26 to the library; this would make the distinction clear and probably ease this bump for new users.