fend
fend copied to clipboard
Rethink order of operations (multiplication/division)
Consider the three operators:
- multiplication
* - division
/ - function application/implicit multiplication/implicit addition (used in
sin pi,3 kg,(2)(4)or1 1/2).
Current these are all parsed mostly left-to-right, with mostly the same precedence (and a bunch of special cases).
These calculations are correct and should continue working as they do now:
1 3/8 inches1/2 kg3pi/2 radians to degrees-sin (pi/2)5 feet 2 inches
These examples don't do what the user might expect:
1 meter / 5 seconds(parsed as(1 meter / 5) * seconds)sin pi/2(parsed as(sin pi)/2)
This calculation can't be parsed at all:
3sin 5(should be equivalent to3 * (sin 5))
It would be nice to come up with a consistent solution for all these expressions, preferably not involving whitespace.
Also note #62
Allowing sin pi/2 and sin pi / 2 feels too ambiguous imo
Number followed by the unit name should have the highest precedence, followed by *, /, and then +, - operations.
@yurivict That would not work with 3/8 inches; it would cause it to be parsed as 3/(8 inches). I think, though, that if we see 3 mm / 4 seconds, this should cause the behavior you described, i.e., check for a unit immediately before the binary operator, and if it's present, then group units more tightly than the binary operator for whatever comes after the binary operator.
The previous approach mostly works well, except it breaks for 3i/2 m, giving m^-1 instead of m because it is grouped as 3i/(2 m) by the rule checking for an Apply before the binop.
I don't think there's any way to resolve this elegantly within the parser, because if we replace i with an actual unit like inches (i.e., 3inches/2 m), then the parser grouping as (3 inches) / (2 m) is actually correct.
I don't know if this means we need to special case certain identifiers like i. Regardless we will need to modify the result formatter to parenthesize the amount in these cases.
I'm glad that the unit tests check the output of the evaluation again, otherwise I wouldn't have caught this!
Looking into this, I think there are (at least) two reasonable approaches to address this:
- Make parsing/lexing whitespace sensitive. With this approach, the user can express intent using the query itself:
3i/2 mwould be interpreted as(3 i / 2) m.3inches/2 mwould be interpreted as(3 inches / 2) m(length^2)3/2 mwould be interpreted as(3 / 2) m.3 / 2mwould be interpreted as3 / (2 m).3 / 2 m,3 inches / 2 mand3 i / 2 mare still ambiguous. Most people would assume that they parse to(3 / 2) m,(3 inches) / (2 m)and(3 i / 2) mrespectively, but there's no way of differentiating this during parsing.
- Make parsing ident sensitive. That is, differentiate whether idents are functions, unitless constants, units, etc. before parsing tokens into an AST. This is similar to @yurivict's proposal.
3 i / 2 mwould be interpreted as(3 i / 2) m.3 inches / 2 mwould be interpreted as(3 inches) / (2 m)(unitless).3 / 2 mwould be interpreted as(3 / 2) m.1 m 3 sin 5 cmcould be parsed as(1 m) + (3 sin 5) cm.
I think the latter would make the most sense as it addresses the ambiguous cases, but it would slightly complicate parsing with an extra step.
Yeah I think that's a good idea and shouldn't be that hard since the environment should be available at that stage.