megaparsec icon indicating copy to clipboard operation
megaparsec copied to clipboard

Add a new primitive: 'breakOn'

Open JustinChristensen opened this issue 4 years ago • 7 comments

https://github.com/mrkkrp/megaparsec/blob/87d425759ab5c290c1cb54b1d52c7bffaae1e1ff/Text/Megaparsec/Stream.hs#L291

Would you consider adding a primitive combinator for breakOn? Or is that something you've already discussed before?

http://hackage.haskell.org/package/text-1.2.4.0/docs/Data-Text-Lazy.html#v:breakOn

Something like takeUpto?

If so, I might be able to open a pull request.

JustinChristensen avatar Aug 10 '19 21:08 JustinChristensen

Isn't takeUpto expressible via takeWhile? You'd need to just negate the predicate, right?

More generally, I don't want to add more primitives because we already have quite a few and it's good to keep the number minimal. If we're to add a primitive, the best candidate is still #314 I think, which e.g. attoparsec has.

mrkkrp avatar Aug 10 '19 22:08 mrkkrp

Isn't takeUpto expressible via takeWhile ? You'd need to just negate the predicate, right?

No, look at breakOn more closely. takeUpto might not be the right name for it. Basically, this would add the ability to take characters up to the first instance of some substring.

The way I currently see that I could do this would be to do something like:

mconcat <$> manyTill (anySingle >>= pure . singleton) (void $ string "foo" <|> eof)

Which seems less than ideal to me, considering the asymptotic behavior of Text append.

JustinChristensen avatar Aug 11 '19 00:08 JustinChristensen

I'm afraid that's what you'll have to do. mconcat should in the end do only one traversal of all the chunks of text, so it's as good as you're going to get.

mrkkrp avatar Aug 11 '19 12:08 mrkkrp

Will keep this one open as a feature request for the combinator.

mrkkrp avatar Aug 11 '19 12:08 mrkkrp

@mrkkrp Only do that if you're still on the fence about whether or not to add it. If you don't like the idea, I'm good with closing this.

JustinChristensen avatar Aug 11 '19 15:08 JustinChristensen

There is nothing wrong with the feature request, I like to have a collection of things users want.

mrkkrp avatar Aug 11 '19 16:08 mrkkrp

I was also looking for this primitive, @JustinChristensen . I implemented it and named it anyTill.

anyTill is used to implement breakCap.

jamesdbrock avatar Dec 11 '20 01:12 jamesdbrock

+100 To try to take advantage of takeWhile1P, I'm currently doing:

fmap Text.concat . many $
  choice
    [ takeWhile1P Nothing (/= Text.head end)
    , notFollowedBy end *> (Text.cons <$> anySingle <*> takeWhile1P Nothing (/= Text.head end))
    ]

Alternatively, perhaps we could expose a primitive

liftConsumer :: MonadParsec e s m => (s -> (Maybe e, a, s)) -> m a

For advanced users wanting granular control? In this case,

type MyParser = Parsec Void Text
breakOn :: Text -> MyParser Text
breakOn end = liftConsumer $
  let (pre, post) = Text.breakOn end
   in (Nothing, pre, post)

@mrkkrp would you accept a PR for any of this?

brandonchinn178 avatar Feb 09 '23 22:02 brandonchinn178

Hey @brandonchinn178, I like the idea of liftConsumer, I think it is a nice escape hatch and it would solve the extensibility concern once and for all. Should its type be rather

liftConsumer :: MonadParsec e s m => (s -> (Either (ParseError s e) a, s)) -> m a

though? I imagine if you want to make it really general, it might even require the user to take State s e instead of s and recalculate the offset manually. Otherwise you would need to get length of the entire stream before/after...

Short answer: I'm happy to merge a primitive that allows us to do custom things, but I'm not super sure what it would look like. I think for it to be really powerful, it might also need to be a bit ugly and make its brave users face some Megaparsec internals. It would be nice not to have to also extend Stream for that.

mrkkrp avatar Feb 17 '23 15:02 mrkkrp

Implemented liftConsumer here: https://github.com/mrkkrp/megaparsec/pull/514

This issue can stay up, to track adding breakOn as a first-class primitive or not.

brandonchinn178 avatar Feb 28 '23 04:02 brandonchinn178

@mrkkrp Did we want to keep this issue open, to track if breakOn would be a first-class primitive or not?

brandonchinn178 avatar Jun 19 '23 20:06 brandonchinn178

I'd say that the probability of adding a new primitive has always been extremely low, but with #514 landing it is now 0.

mrkkrp avatar Jun 20 '23 07:06 mrkkrp