parsatron icon indicating copy to clipboard operation
parsatron copied to clipboard

Bug in `attempt`?

Open ath opened this issue 11 years ago • 10 comments

;; works as expected
(run (attempt (p/char \space)) " ")
==> \space

;; expected nil
(run (attempt (p/char \space)) "")
==> Unexpected end of input at line: 1 column: 1
  [Thrown class java.lang.RuntimeException]

;; expected nil
(run (attempt (p/char \space)) "x")
==> Unexpected token 'x' at line: 1 column: 1
  [Thrown class java.lang.RuntimeException]

From the docstring “A parser that will attempt to parse p, and upon failure never consume any input” it was not clear to me, that attempt would throw an Exception.

ath avatar Sep 20 '12 16:09 ath

The docstring is technically right, but I agree that it's a bit jarring at first.

attempt is for things like this:

(defparser hello []
  (char \H)
  (char \e)
  (char \l)
  (char \l)
  (char \o)
  (always :hello))

(defparser hi []
  (char \H)
  (char \i)
  (always :hi))

(run (choice (hello)
             (hi))
     "Hello!")
; :hello

(run (choice (hello)
             (hi))
     "Hi!")
; RuntimeException...

(run (choice (attempt (hello))
             (hi))
     "Hi!")
; :hi

The first run works because it just parses Hello.

The second run fails because the first one consumes and parses the H just fine, then errors out on the e. choice moves to the next parser on the list, but the H has already been consumed, so it tries to parse "i!" with hi, which fails.

Wrapping the hello parser in an attempt lets you ignore the fact that it consumed input.

sjl avatar Sep 20 '12 19:09 sjl

Why does choice consume anything if it doesn’t match? I would have expected some kind of backtracking. In the second example it can match the H for Hi!, but then fails. So it backtracks and tries again with (hi), which is successful.

ath avatar Sep 20 '12 19:09 ath

And I am still not sure why attempt should throw exceptions upon some failures, while on others it doesn’t. In your example I would assume from my tests that (attempt (hello)) should throw an Exception, because it can’t parse the input.

ath avatar Sep 20 '12 20:09 ath

Why does choice consume anything if it doesn’t match? I would have expected some kind of backtracking. In the second example it can match the H for Hi!, but then fails. So it backtracks and tries again with (hi), which is successful.

I'll leave this for @youngnh to answer since I'm still a bit shaky on this.

I am still not sure why attempt should throw exceptions upon some failures, while on others it doesn’t. In your example I would assume from my tests that (attempt (hello)) should throw an Exception, because it can’t parse the input.

Okay, first, attempt never throws anything. The only place the RuntimeException is thrown (aside from the special case of sanity-checking many and zero-element parsers) is in run:

(defn run
  [p input]
  (let [result (run-parser p ...)]
    (condp instance? result
      Ok (:item result)
      Err (throw (RuntimeException. ^String (:errmsg result))))))

Basically, if the parser you give to run results in an Ok, the result is unpacked and returned normally. If it returns an error, the exception is thrown.

So:

(def hello []
  (string "hello")
  (always :hello))

(def hi []
  (string "hi")
  (always :hi))

First case:

(run (hello) "hello")
; :hello

Matches fine, returns result. All good. Next:

(run (hello) "hi")
; RuntimeException

The (hello) returned an error. run threw a runtime exception.

(run (attempt (hello)) "hi")
; RuntimeException

The (attempt (hello)) returned an error. run threw a runtime exception.

(run (choice (hello)
             (hi))
     "hi")
; RuntimeException

The (hello) returned an error. It also consumed the "h" before it got to the thing that caused the error. (hi) tried to parse "i" and returned an error, so (choice ...) returned an error. run threw a runtime exception. This is the part I'm a bit shaky on and need @youngnh to explain further.

(run (choice (attempt (hello))
             (hi))
     "hi")
; :hi

The (attempt (hello)) returned an error, but did not consume any input. (hi) parsed the "hi" and returned Ok. run unwraps the result and returns it.

sjl avatar Sep 20 '12 20:09 sjl

Okay good, thanks for the clarification. It was run who threw the Exception. So now it is clear why my example above resulted in one. The attempt tried to parse it, but was unsuccessful. Now run ran out of options and couldn’t parse my "" or "x" inputs.

What is the difference between these two?

(run (either (char \x) (char \y)) "y")
(run (either (attempt (char \x)) (char \y)) "y")

So, attempt is the “optional” operator. It tries to match if possible, but if it can’t match, then it consumes nothing and is effectively a null-operation. Is that correct?

ath avatar Sep 20 '12 20:09 ath

Okay, after rewatching the talk to refresh my memory of the internals, I think it's a tiny bit different than I mentioned above.

(run (choice (hello)
             (hi))
     "hi")
; RuntimeException

In that example, hello effectively returned "error, and I consumed a value". choice then immediately returned a value without even trying (hi), becuase it know that it has to try each option at the same spot, and now that it's moved forward it can't do that any more.

Okay, now in your last example:

(run (either (char \x) (char \y)) "y")

This first tries the (char \x) parser. It returns an error, but (importantly) it did not consume any input. So choice is still free to try the next option at the correct location.

If you chained two characters together and the first one matched but the second one didn't, then it would have consumed input.

But because (char \whatever) only tries to parse a single thing, on its own it never consumes input unless it succeeds.

(run (either (attempt (char \x)) (char \y)) "y")

The attempt here is really a no-op. It's not going to hurt anything (maybe a tiny performance hit of course), but since it's wrapping something that can't possibly consume input if it fails, it's not necessary.

Back to my example:

(defparser hello []
  (char \H)
  (char \e)
  (char \l)
  (char \l)
  (char \o)
  (always :hello))

Now defparser wraps all of its body in a >> for you. So this is exactly equivalent (at least for this question):

(def hello
  (>> 
    (char \H)
    (char \e)
    (char \l)
    (char \l)
    (char \o)
    (always :hello)))

Because this is parsing more than one thing at a time, and consumes/parses each bit before moving onto the next, and can fail anywhere along the way, it may end up consuming input. That's why we need the attempt in a choice.

Now, with all that said, I think it would be a lot more intuitive to use if (choice foo bar) wrapped its arguments in (attempt)s for you. I don't see any downside to that and it'd make things a lot easier to use.

sjl avatar Sep 20 '12 21:09 sjl

Yes agreed, if it automatically wrapped its args, then it would give the backtracking effect.

ath avatar Sep 20 '12 21:09 ath

Backtracking is a common request and common source of confusion. The Parsatron doesn't do it by default because Haskell's parsec doesn't do it and this lib started out as a straight port. As things currently stand, users have to manually implement backtracking using parsers like attempt and lookahead just as you worked out above.

There is something natural that matches people's expectations about backtracking. Its the way that new users of the library expect things to work, so there is an argument there that The Parsatron should implement it and people's programs will work as they want them to without muss or fuss.

At the same time backtracking can incur very high costs in the case of long parse paths that fail close to the end, in highly branched parsers, or in parsers that attempt a match multiple times (like many). These problems are subtle and if your parser is slow, require knowledge of the implementation in order to diagnose.

Right now, I think I'm leaning towards creating a second namespace in The Parsatron with backtracking equivalents of the functions in the current namespace. So that users may switch between implementations simply by changing the :use or :require definitions at the top of their file.

youngnh avatar Sep 21 '12 13:09 youngnh

Maybe this is the right way to approach this. A new NS that can be used for testing. People could use the more comfortable one and see how it works out, and in case something is not going fast enough, users with more insight can switch back to the more low-levelish implementation. But I guess there is still room for more efficiency anyway. And that may lead to some uglieness… As soon we have +/- copy pasted code, it may require changes at two places, if bugfixes or speed improvements are being applied. Right now though I would still prefer to have auto-attempting versions of either/choice.

ath avatar Sep 21 '12 19:09 ath

I've just run into the backtracking problem myself. I think the issue is that it's natural to expect the combinators to behave opaquely; having one branch of a choice fail but still consume input is very strange.

One option is to add something like choice* from #26 to the library; this would make the distinction clear and probably ease this bump for new users.

mullr avatar Aug 05 '13 23:08 mullr