sympy icon indicating copy to clipboard operation
sympy copied to clipboard

parse_latex fails on "d \cdot 2"

Open nitinjan06 opened this issue 1 year ago • 3 comments

Not sure why this is happening:

>>> from sympy.parsing.latex import parse_latex
>>> parse_latex(r"c \cdot 2")
c*2
>>> parse_latex(r"2 \cdot d") 
2*d
>>> parse_latex(r"d \cdot 2") 
dcdot*2

Any ideas?

nitinjan06 avatar Feb 11 '24 21:02 nitinjan06

I guess dot is special cased somewhere:

In [15]: parse_latex(r"d \cdot 2")
Out[15]: dċ⋅2

In [16]: parse_latex(r"\xdot")
Out[16]: ẋ

In [17]: sympify('xdot')
Out[17]: ẋ

oscarbenjamin avatar Feb 11 '24 23:02 oscarbenjamin

Hmm ok, is there a way to fix this?

nitinjan06 avatar Feb 13 '24 18:02 nitinjan06

>>> from sympy.parsing.latex import parse_latex_lark
>>> parse_latex_lark(r'c \cdot 2')
2*c

This bug does not exist in lark. I think that you are having ANTLR installed, or parse_latex is implicitly using antlr backend. I think that antlr grammar may be wrong to have such nonstandard behavior, but we are unsure to continue the project for antlr. So it's suggested to use parse_latex_lark instead

sylee957 avatar Feb 14 '24 08:02 sylee957

This bug happens due the DIFFERENTIAL grammar rule so that letter "d" is failed to parsed as a symbol; instead, it is grouped with \cdot as an atom.

DIFFERENTIAL: 'd' WS_CHAR*? ([a-zA-Z] | '\\' [a-zA-Z]+);

The parse tree for : d \cdot 2

(math (relation (expr (additive (mp (unary (postfix (exp (comp (atom d \cdot)))) (postfix (exp (comp (atom (number 2)))))))))))

instead of being

(math (relation (expr (additive (mp (mp (unary (postfix (exp (comp (atom d)))))) \cdot (mp (unary (postfix (exp (comp (atom (number 2))))))))))))

The problem is that the rule is hard-coded (this part '\\' [a-zA-Z]+) and does not exclude commands like CDOT , DIV, etc. I thought of semantic predicates to solve this bug, is there any other better solution ?

mohamedrezk122 avatar Feb 26 '24 22:02 mohamedrezk122

My suggestion is that we should define differential as a rule like 'd' WS_CHAR*? symbol instead of using the regex like that. I think that '\\' [a-zA-Z]+ had used as a shallow heuristics to handle cases like \alpha, \beta, however, it can get to a lot of false positives. However, if it does not work, then it is possible to do something like '\\alpha' | '\\beta' | ... | '\\omega'. I don't think that there are other options than listing all greek alphabets. (There can be more because you should add capital alphabets, and also things like \varepsilon as well.

sylee957 avatar Feb 28 '24 08:02 sylee957

Also functions like \sin, \cos ,etc have to be listed.

mohamedrezk122 avatar Feb 28 '24 09:02 mohamedrezk122