sympy
sympy copied to clipboard
`_print_Add` is fundamentally wrong
The print_Add
in printing/str.py
is making very strong and incorrect assumptions leading to incorrectly printed string (and c/latex/....) in a lot of cases (and this breaks our code generation in Devito since incorrect sign leads to unstable kernels producing NaNs)
In [15]: from sympy import *
In [16]: a,b,c,d = symbols('a b c d')
In [17]: expr = Add(Add(-a, b, evaluate=False), Add(-c, d, evaluate=False), evaluate=False)
In [18]: expr
Out[18]: -(a + b) - (c + d) # this should be (-a + b) + (-c + d)
Basically, _print_Add
always assume that any term t
that is an Add that starts with a -
is automatically a -(t)
which is wrong since it would be a Mul
. The lines leading to the issues are lines 55-65 Here
t = self._print(term)
if t.startswith('-'):
sign = "-"
t = t[1:]
else:
sign = "+"
if precedence(term) < PREC or isinstance(term, Add):
l.extend([sign, "(%s)" % t])
else:
l.extend([sign, t])
That simply check if the string starts with -
without any regards for the expression. so -a +b
is printed as - (a+b)
since it's an Add and starts with -
.
Nice find! I was wondering where this was being mishandled so badly in another issue/PR. The change is straightforward, I think:
(The docstring was just there to demonstrate that the change worked as advertised.)
Thankls for the quick response. Wouldn't something like
for term in terms:
l.extend([self._print(Mul(*term._as_coeff_Mul())])
be "safer"?
Go ahead and give it a try. Unanticipated effects is the bane of my experience with SymPy. If there is something wrong, tests will fail. The change I recommended was only tested to give the expected result that you have in the OP.
@mloubout Can I work on this issue?
@Ishanned sure go ahead I don't have time this week for it may get back to you next week
I would also like to try and work on this issue @Ishanned
@shivamurali there is PR #24030 openen that needs minor cosmetic changes that will fixx this issue.
@mloubout I was wondering if I could also try to work on the issue
It seems that something broke after the release of sympy 1.10.1
because the example provided by @mloubout works fine under that version:
>>> sympy.__version__
'1.10.1'
>>>
>>> a, b, c, d = symbols('a b c d')
>>> Add(Add(-a, b, evaluate=False), Add(-c, d, evaluate=False), evaluate=False)
-a + b - c + d
It's also worth nothing that the fix suggested by @smichr produces a slightly different representation for the given sample expression:
# On version '1.11.1' + fix provided by @smichr.
>>> Add(Add(-a, b, evaluate=False), Add(-c, d, evaluate=False), evaluate=False)
(-a + b) + (-c + d)
However, the said fix doesn't provide the cleanest representation for expressions like -a - b - c
:
# On version '1.10.1'.
>>> Add(Add(-a, -b, evaluate=False), -c, evaluate=False)
-c - a - b
# On version 1.11.1 + fix provided by @smichr.
>>> Add(Add(-a, -b, evaluate=False), -c, evaluate=False)
-c + (-a - b)
At this point I feel like someone who truly understands sympy/printing/str.py
needs to step in to fix the printing because it is very unreliable (especially after discovering yet another bug, which I highlighted in issue #24108 -- referenced by @asmeurer above).
doesn't provide the cleanest representation for expressions like
-a - b - c
The question is what you actually expect from the printing here. Add(Add(-a, -b), -c)
is not the same as Add(-a, -b, -c)
. Why would you use evaluate=False
to prevent the Add
from flattening if you really wanted it to display like -a - b - c
?
I mean I was merely pointing out the difference between 1.10.1
and 1.11.1 + the suggested fix
. My expectation is to have a reliable string representation, which at the minute doesn't quite exist. (I am saying this mostly because expressions involving multiplications and negations are fairly broken, as highlighted in #24108.)
In my case, I'm relying a lot on the LaTeX module to parse expressions and in that part of the library evaluate=False
is used pretty much everywhere. This is why in all my examples I'm using that flag set to False
.
The thing here is that SymPy tries to print expressions in a "nice way", so unevaluated printing will almost certainly fail in some cases. Just as people will be upset that their expression doesn't print "nicely" for others.
I think the best way is to introduce some (global?) flag that just skips all the fancy sorting and "nice" printing and prints brute force. There are quite a few issues like this where unevaluated printing breaks. Maybe even implement an "UnevaluatedStringPrinter" overriding the most problematic methods (Add and Mul). However, one should note that there is no such thing as Sub and Div in SymPy, so a - b
will always be a + (-1)*b
, so it is fundamentally impossible to distinguish between Add(a, Mul(-1, b, evaluate=False), evaluate=False)
and Add(a, -b, evaluate=False)
.
While I agree with the fact that it won't always be perfect there is a difference between "sometime it won't be pretty" and "it not the correct mathematical expression".
The printer is used for code generation for example so wether or not it's always pretty it needs to always be correct.
it needs to always be correct
Indeed. I'm just saying that many of the ~errors~ fixes introduced recently are to make unevaluated stuff print "better unevaluated" at the expense of correct and nice. I'm doubting that one can have both with the same printer.
I think that the real problem is not so much with the printers but just that SymPy's expression types were not designed to be used in the way that is typically intended by someone who uses evaluate=False
. Essentially pretty much any use of unevaluated expressions goes against the basic design of SymPy. If I was starting things from scratch then there would be a different kind of symbolic expression for the purposes of parsing and printing and it would have things like Div
and Sub
etc and it wouldn't mess around with the order of arguments to an Add
or Mul
and so on.
That being said the fact that the printers do mess around with the order of arguments has always annoyed me. They should really just print the expression as is (in so far as that is possible).
That being said the fact that the printers do mess around with the order of arguments has always annoyed me. They should really just print the expression as is (in so far as that is possible).
It's also a huge performance problem. Printing a large expression can be very slow unless sorting is disabled with order='none'
. It's even worse when it tries to sort using numerical evaluation, which can sometimes just hang the printer (https://github.com/sympy/sympy/issues/10800). I think we should change printer defaults to order='none'
, and completely removing any sort of evalf
from any printer code. It really doesn't matter if pi + E
prints as E + pi
because $e < \pi$. I don't think anyone even expects that in the first place.
The only reason to do sorting in the printers is to print polynomials in a nice lexicographic order. It's not really necessary to try to extend this to every possible non-polynomial expression. Also, currently Add
already sorts its arguments, so I don't know if the printer sorting is even necessary for that use case. There's also the sorting to print symbols in alphabetical order in a monomial, which also should already happen from the Mul
constructor sorting. And either way, these could both be done much more efficiently with a much more restricted sort key.
I think we should change printer defaults to
order='none'
I agree.
In my case, I'm relying a lot on the LaTeX module to parse expressions and in that part of the library evaluate=False is used pretty much everywhere. This is why in all my examples I'm using that flag set to False.
Ideally it wouldn't do this by default https://github.com/sympy/sympy/issues/24116. That's immaterial to this issue, though, because unevaluated expressions should print correctly regardless of where they came from.
If you don't actually want an unevaluated expression, for now, you can use the rebuild
function from sympy.strategies
to force it to evaluate, which will solve these printing woes:
>>> from sympy.strategies.rl import rebuild
>>> expr = parse_latex(r'1 + 1')
>>> expr
1 + 1
>>> rebuild(expr)
2
>>> parse_latex('\\frac {-x + \\sqrt{y}}{2}') # https://github.com/sympy/sympy/issues/24108
-x + sqrt(y)/2
>>> rebuild(parse_latex('\\frac {-x + \\sqrt{y}}{2}'))
-x/2 + sqrt(y)/2
Well, the reason the printers are quite complicated is that we would like to get "nice" expressions, which typically breaks for unevaluated cases. Just look at _print_Add
or _print_Mul
. It is not '+'.join(self._print(a) for a in expr.args)
...