Earley icon indicating copy to clipboard operation
Earley copied to clipboard

feature: constraint validation of production parse result

Open user318 opened this issue 5 years ago • 7 comments

Hello.

I did not find how to implement some sort of constraints on production. For example if I wrote a rule to parse a series of digits and wanted to validate that if falls into a specific range, what would I need to do? Of course I can check the values after the parser ends, but the context is lost in such case like position where error occurs. I see no functions in current interface I can use to do that and also I have not find something similar in the examples. It probably could be something function like:

validate :: (a -> Bool) -> Prod r e t a -> Prod r e t a

or

validate :: (a -> Maybe b) -> Prod r e t a -> Prod r e t b

so when the predicate fails or returns Nothing, the parser fails for that production. And we can use it like:

num <- rule $ validate (<= 1234) $ read <$> some digit

Or may be some sort of extended rule function with validation:

ruleValidate :: (a -> Bool) -> Prod r e t a -> Grammar r (Prod r e t a)
...
num <- ruleValidate (<= 1234) $ read <$> some digit

user318 avatar Apr 15 '20 22:04 user318

Hey!

Yeah, I think you're right that this isn't possible with the current interface. Would you like to contribute this feature?

ollef avatar Apr 16 '20 19:04 ollef

Heh... I might, but I am a beginner in Haskell and your code looks like Chinese to me now. :) And it would take a time to me. If you gave some hints what parts need to be modified, it might be easier to dive in.

user318 avatar Apr 17 '20 12:04 user318

Is that what you need to add "keyword parsing" to a parser? For example, I want to write a little expression parser, but my AST has let..in, along with alphanumeric variable names. I don't want Var "let" to be a valid parse, but I can't see anyway to express that in Earley at the moment.

Edit: I guess the alternative is an explicit tokenisation phase.

ocharles avatar Nov 10 '20 22:11 ocharles

No, I need it for another purpose, but in your case it can be useful too if you have a list of keywords. Examples of what I want to use it for: limiting the range of an integer if I want to read a part of IP-address or a port number. Or to verify that the "IP/mask" has the correct IP with host bits set to zero. Or the range - is correct (number1 <= number2).

user318 avatar Nov 10 '20 22:11 user318

Oh sorry, I was being a bit selfish - I meant to say would I need this requested feature for my purpose! I am trying to use Earley on a stream of Chars and don't want to parse let as Var "let". If I were in a monadic parser setting, I could use guard, but that's not an option as we're limited to Applicative. Your suggestion is like guard :: Bool -> m a, but would be more like filterProd :: (a -> Bool) -> Prod r e t a -> Prod r e t a, and I would be able to define identifier <- rule $ filterProd (notElem [ "let", "in" ]) (many (satisfy isAlphaNum)).

Edit: I thought this was done in https://github.com/ollef/Earley/pull/47, but that's something different. I should stop posting on GitHub near midnight...

ocharles avatar Nov 10 '20 22:11 ocharles

Just an update on this: I moved to an explicit tokeniser instead and no longer need this.

ocharles avatar Nov 24 '20 09:11 ocharles

I wonder if this could be generalized to Filter :: ([a] -> [b]) -> Prod r e t a -> Prod r e t b, where this disambiguation function could filter the list. This would also enable generating a parse tree, rather than a list of all flat parses, i.e. pushing disambiguation towards the leaves. For example disambiguate d = Filter (pure . d)

expipiplus1 avatar May 11 '23 07:05 expipiplus1