math-expressions icon indicating copy to clipboard operation
math-expressions copied to clipboard

Variables should resolve before keywords.

Open widavies opened this issue 5 years ago • 2 comments

Great library, one small issue I'm having though. It seems like you resolve built-in keywords/functions before the variables (from the ContextModel) are resolved. This yields a couple of problems.

If I bind a variable name like score, the library seems to be trying to resolve e and subsequently doesn't properly resolve score (as I presume its then trying to resolve scor?). This happens whenever any variable name contains a built-in keyword. Other examples of invalid variable names:

variable ceiling resin

Basically anything that contains e or a built-in function like sin, ceil, etc.

Also, it should be noted in my use case that variables can be named anything (my users define variables).

So for example, a variable name could be something like qw%d. This is more of a special case, and not really needed, but it would be nice if your code was smart enough to tell that this is probably a variable name and not a wd and d separated by a modulus. Although, if you're resolving variables first, this actually should be a fairly trivial case to handle.

Thanks! I can also take a look at fixing this if you point me in a right direction / have any pointers.

widavies avatar Aug 24 '20 02:08 widavies

Thanks for the report, you're right that this is unintentional. It's is an issue with the parser and somewhat related to #10.

For example, score is currently parsed as e(scor):

$ dart bin/interpreter.dart
[..]
score
> Tokens: [(VAR: scor), (EFUNC: e)]
> RPN: [(VAR: scor), (EFUNC: e)]
> Expression: e(scor)

The issue is in the Lexer#tokenize function in lib/src/parser.dart which processes the input as individual characters first, and prioritises known keywords before considering longer variable or function names. This leads to situations where parts of word tokens are parsed incorrectly.

I think this could be improved fairly easily by simply avoiding splitting up coherent word tokens (e.g. potential variable/function names). The qw%d case is a bit trickier as that contains the modulus operator. I'd be inclined to restrict legal variable names to alphanumeric characters only, and probably also requiring the first character to be alphabetic to avoid confusion with number literals.

Feel free to have a stab at this, and let me know if you have any further questions.

On a side note, I've started rewriting the parser to use a grammar-based approach as described in this comment. The WIP code is on the feature-parser-rewrite branch but it's still missing support for parsing functions, and I'm not sure when I'll get around to finishing that off. I'd be happy to accept a contribution to either the existing or new WIP parser code.

fkleon avatar Aug 24 '20 09:08 fkleon

For now, I may just use a workaround of using a regex replace for all variables instead of using the ContextModel to bind variables. I may try to work on a temporary fix, but I'm rather pressed for time and may just wait for your feature-parser-rewrite. No rush. Thanks!

widavies avatar Aug 24 '20 22:08 widavies