factor icon indicating copy to clipboard operation
factor copied to clipboard

Add support for custom number parsers to the Listener

Open nomennescio opened this issue 3 years ago • 2 comments

Currently the Listener parses a line of input by trying to find a token as a word in the dictionary, or else try to convert it to a number. Parsing words therefore need to be triggered for all non-numerical input that must be parsed. But some specific input might be more naturally presented not with a parsing word that acts as a prefix, but as a single token that needs to be parsed by a custom string>type "number" parser.

Can we extend the mechanism of number parser, such that it is possible to extend it with custom parsers (appending to the end of a list), such that if parsers at the beginning of the list fail to parse, the next parser in the list is called, and so on. If no parser succeeds, an error is generated as usual.

For instance, I'm currently working on a representation of a tuple which is most naturally shown as an annotated number: 5±1 means "5 plus or minus 1", and is actually a tuple, but is naturally treated as a mathematical object. If I could add a custom parser to parse the "5±1" token, there would again be a round-trip possible between representation (custom prettyprinting) and parsing (currently it doesn't parse).

nomennescio avatar Feb 04 '22 14:02 nomennescio

It’s a good idea, and we’ve talked about having other kinds of literals, such as color or IP addresses.

There are a few syntax ways we do this currently:

DICE: 2d4+6 URL”google.com” R/ fo[o]+/ C{ 1 3 }

It’s not quite as short as yours but for the moment, we aren’t ready to extend the scan-datum approach in the parser.

The new parser for 0.100 might support registering token parsers in this manner. I’ll make sure it’s in the list.

We’ve talked about having complex numbers be more elegantly expressed as well in this manner.

On Feb 4, 2022, at 6:53 AM, nomennescio @.***> wrote:

 Currently the Listener parses a line of input by trying to find a token as a word in the dictionary, or else try to convert it to a number. Parsing words therefore need to be triggered for all non-numerical input that must be parsed. But some specific input might be more naturally presented not with a parsing word that acts as a prefix, but as a single token that needs to be parsed by a custom string>type "number" parser.

Can we extend the mechanism of number parser, such that it is possible to extend it with custom parsers (appending to the end of a list), such that if parsers at the beginning of the list fail to parse, the next parser in the list is called, and so on. If no parser succeeds, an error is generated as usual.

For instance, I'm currently working on a representation of a tuple which is most naturally shown as an annotated number: 5±1 means "5 plus or minus 1", and is actually a tuple, but is naturally treated as a mathematical object. If I could add a custom parser to parse the "5±1" token, there would again be a round-trip possible between representation (custom prettyprinting) and parsing (currently it doesn't parse).

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

mrjbq7 avatar Feb 04 '22 15:02 mrjbq7

The thing to be careful about is managing the order of parsers, which does determine the language accepted by such a "parsing" approach. Parsers earlier consulted could aggressively convert tokens into "their" type, effectively bypassing other parsers. I guess some design choices might help with that, but it can't be prevented automatically. Adding custom parsers to the end of the list at least give the "builtin" parsers an advantage. By adding parsers to the bottom only, the existing language is effectively still accepted, i.e. language extensions are allowed, language restrictions are not. Of course that only holds if you can manage/enforce such an order, which is almost not possible without a central repository. The good thing of course is that the Factor source tree effectively acts as such a central repository. I'm not sure what a good balance would be between enforcing an order and giving classes freedom to add custom parsers at will (in a potentially unpredictable order). Maybe a solution similar to some of the heuristics used by class linearization could be designed.

nomennescio avatar Feb 04 '22 16:02 nomennescio