megaparsec
megaparsec copied to clipboard
Add a new primitive: 'breakOn'
https://github.com/mrkkrp/megaparsec/blob/87d425759ab5c290c1cb54b1d52c7bffaae1e1ff/Text/Megaparsec/Stream.hs#L291
Would you consider adding a primitive combinator for breakOn
? Or is that something you've already discussed before?
http://hackage.haskell.org/package/text-1.2.4.0/docs/Data-Text-Lazy.html#v:breakOn
Something like takeUpto
?
If so, I might be able to open a pull request.
Isn't takeUpto
expressible via takeWhile
? You'd need to just negate the predicate, right?
More generally, I don't want to add more primitives because we already have quite a few and it's good to keep the number minimal. If we're to add a primitive, the best candidate is still #314 I think, which e.g. attoparsec
has.
Isn't takeUpto expressible via takeWhile ? You'd need to just negate the predicate, right?
No, look at breakOn more closely. takeUpto might not be the right name for it. Basically, this would add the ability to take characters up to the first instance of some substring.
The way I currently see that I could do this would be to do something like:
mconcat <$> manyTill (anySingle >>= pure . singleton) (void $ string "foo" <|> eof)
Which seems less than ideal to me, considering the asymptotic behavior of Text
append.
I'm afraid that's what you'll have to do. mconcat
should in the end do only one traversal of all the chunks of text, so it's as good as you're going to get.
Will keep this one open as a feature request for the combinator.
@mrkkrp Only do that if you're still on the fence about whether or not to add it. If you don't like the idea, I'm good with closing this.
There is nothing wrong with the feature request, I like to have a collection of things users want.
I was also looking for this primitive, @JustinChristensen . I implemented it and named it anyTill
.
anyTill
is used to implement breakCap
.
+100 To try to take advantage of takeWhile1P
, I'm currently doing:
fmap Text.concat . many $
choice
[ takeWhile1P Nothing (/= Text.head end)
, notFollowedBy end *> (Text.cons <$> anySingle <*> takeWhile1P Nothing (/= Text.head end))
]
Alternatively, perhaps we could expose a primitive
liftConsumer :: MonadParsec e s m => (s -> (Maybe e, a, s)) -> m a
For advanced users wanting granular control? In this case,
type MyParser = Parsec Void Text
breakOn :: Text -> MyParser Text
breakOn end = liftConsumer $
let (pre, post) = Text.breakOn end
in (Nothing, pre, post)
@mrkkrp would you accept a PR for any of this?
Hey @brandonchinn178, I like the idea of liftConsumer
, I think it is a nice escape hatch and it would solve the extensibility concern once and for all. Should its type be rather
liftConsumer :: MonadParsec e s m => (s -> (Either (ParseError s e) a, s)) -> m a
though? I imagine if you want to make it really general, it might even require the user to take State s e
instead of s
and recalculate the offset manually. Otherwise you would need to get length of the entire stream before/after...
Short answer: I'm happy to merge a primitive that allows us to do custom things, but I'm not super sure what it would look like. I think for it to be really powerful, it might also need to be a bit ugly and make its brave users face some Megaparsec internals. It would be nice not to have to also extend Stream
for that.
Implemented liftConsumer
here: https://github.com/mrkkrp/megaparsec/pull/514
This issue can stay up, to track adding breakOn
as a first-class primitive or not.
@mrkkrp Did we want to keep this issue open, to track if breakOn
would be a first-class primitive or not?
I'd say that the probability of adding a new primitive has always been extremely low, but with #514 landing it is now 0.