sympy
sympy copied to clipboard
parse_latex fails on "d \cdot 2"
Not sure why this is happening:
>>> from sympy.parsing.latex import parse_latex
>>> parse_latex(r"c \cdot 2")
c*2
>>> parse_latex(r"2 \cdot d")
2*d
>>> parse_latex(r"d \cdot 2")
dcdot*2
Any ideas?
I guess dot is special cased somewhere:
In [15]: parse_latex(r"d \cdot 2")
Out[15]: dċ⋅2
In [16]: parse_latex(r"\xdot")
Out[16]: ẋ
In [17]: sympify('xdot')
Out[17]: ẋ
Hmm ok, is there a way to fix this?
>>> from sympy.parsing.latex import parse_latex_lark
>>> parse_latex_lark(r'c \cdot 2')
2*c
This bug does not exist in lark. I think that you are having ANTLR
installed, or parse_latex
is implicitly using antlr backend.
I think that antlr grammar may be wrong to have such nonstandard behavior, but we are unsure to continue the project for antlr. So it's suggested to use parse_latex_lark
instead
This bug happens due the DIFFERENTIAL grammar rule so that letter "d" is failed to parsed as a symbol; instead, it is grouped with \cdot as an atom.
DIFFERENTIAL: 'd' WS_CHAR*? ([a-zA-Z] | '\\' [a-zA-Z]+);
The parse tree for : d \cdot 2
(math (relation (expr (additive (mp (unary (postfix (exp (comp (atom d \cdot)))) (postfix (exp (comp (atom (number 2)))))))))))
instead of being
(math (relation (expr (additive (mp (mp (unary (postfix (exp (comp (atom d)))))) \cdot (mp (unary (postfix (exp (comp (atom (number 2))))))))))))
The problem is that the rule is hard-coded (this part '\\' [a-zA-Z]+
) and does not exclude commands like CDOT , DIV, etc.
I thought of semantic predicates to solve this bug, is there any other better solution ?
My suggestion is that we should define differential as a rule like 'd' WS_CHAR*? symbol
instead of using the regex like that. I think that '\\' [a-zA-Z]+
had used as a shallow heuristics to handle cases like \alpha, \beta
, however, it can get to a lot of false positives.
However, if it does not work, then it is possible to do something like '\\alpha' | '\\beta' | ... | '\\omega'
. I don't think that there are other options than listing all greek alphabets. (There can be more because you should add capital alphabets, and also things like \varepsilon
as well.
Also functions like \sin
, \cos
,etc have to be listed.