parsec
parsec copied to clipboard
`notFollowedBy` and parsers which don't consume input.
Currently notFollowedBy
always succeeds with parsers that don't consume input:
-- This parser succeeds.
> parseTest (lookAhead (string "a")) "abc"
"a"
-- Therefore this parser should fail – but it doesn't.
> parseTest (notFollowedBy (lookAhead (string "a"))) "abc"
()
Is this bug old enough to be considered a feature? (Even if so, this behavior should probably be documented.) If not, here's a version that works (but no idea how much slower, if at all):
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
I guess I'm ambivalent, here. It's definitely surprising behavior.
It might be in the "so old it's not a bug" bucket, however if this would break any parser I'm not even sure what they would look like.
I recently wasted some time trying to understand why notFollowedBy . notFollowedBy is not equal to lookAhead (particularly, I had a line of code that said "notFollowedBy eof" which IMHO makes a lot of sense). I think this behavior, while unfortunate, is acceptable, but should at least be documented.
@tulcod, FYI this is fixed in Megaparsec. See here how it works. The examples (and upcoming tests) include notFollowedBy . notFollowedBy = lookAhead
property and notFollowedBy eof
works just like you would expect.
This particular combinator can be borrowed by Parsec if it's placed in Text.Parsec.Prim
module, since I've written it as a primitive combinator, Parsec version would be:
-- | @notFollowedBy p@ only succeeds when parser @p@ fails. This parser
-- does not consume any input and can be used to implement the “longest
-- match” rule.
notFollowedBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy p = ParsecT $ \s@(State input pos _) _ _ eok eerr -> do
let cok' _ _ _ = eerr $ newErrorUnknown pos
cerr' _ = eok () s $ newErrorUnknown pos
eok' _ _ _ = eerr $ newErrorUnknown pos
eerr' _ = eok () s $ newErrorUnknown pos
unParser p s cok' cerr' eok' eerr'
Well, this is a bit more primitive than Megaparsec version… To match quality of Megaparsec error messages some additional work is required, so error messages will differ from those I posted.