sympy `_print_Add` is fundamentally wrong

The print_Add in printing/str.py is making very strong and incorrect assumptions leading to incorrectly printed string (and c/latex/....) in a lot of cases (and this breaks our code generation in Devito since incorrect sign leads to unstable kernels producing NaNs)

In [15]: from sympy import *

In [16]: a,b,c,d = symbols('a b c d')

In [17]: expr = Add(Add(-a, b, evaluate=False), Add(-c, d,  evaluate=False),  evaluate=False)

In [18]: expr
Out[18]: -(a + b) - (c + d) # this should be (-a + b) + (-c + d)

Basically, _print_Add always assume that any term t that is an Add that starts with a - is automatically a -(t) which is wrong since it would be a Mul . The lines leading to the issues are lines 55-65 Here

            t = self._print(term)
            if t.startswith('-'):
                sign = "-"
                t = t[1:]
            else:
                sign = "+"
            if precedence(term) < PREC or isinstance(term, Add):
                l.extend([sign, "(%s)" % t])
            else:
                l.extend([sign, t])

That simply check if the string starts with - without any regards for the expression. so -a +b is printed as - (a+b) since it's an Add and starts with -.

Sep 06 '22 12:09 mloubout

Nice find! I was wondering where this was being mishandled so badly in another issue/PR. The change is straightforward, I think:

(The docstring was just there to demonstrate that the change worked as advertised.)

Sep 06 '22 15:09 smichr

Thankls for the quick response. Wouldn't something like

for term in terms:
   l.extend([self._print(Mul(*term._as_coeff_Mul())])

be "safer"?

Sep 06 '22 15:09 mloubout

Go ahead and give it a try. Unanticipated effects is the bane of my experience with SymPy. If there is something wrong, tests will fail. The change I recommended was only tested to give the expected result that you have in the OP.

Sep 07 '22 02:09 smichr

@mloubout Can I work on this issue?

Sep 08 '22 15:09 Ishanned

@Ishanned sure go ahead I don't have time this week for it may get back to you next week

Sep 08 '22 15:09 mloubout

I would also like to try and work on this issue @Ishanned

Sep 29 '22 22:09 shivamurali

@shivamurali there is PR #24030 openen that needs minor cosmetic changes that will fixx this issue.

Sep 29 '22 22:09 mloubout

@mloubout I was wondering if I could also try to work on the issue

Sep 29 '22 22:09 joshuacappelli

It seems that something broke after the release of sympy 1.10.1 because the example provided by @mloubout works fine under that version:

>>> sympy.__version__
'1.10.1'
>>> 
>>> a, b, c, d = symbols('a b c d')
>>> Add(Add(-a, b, evaluate=False), Add(-c, d, evaluate=False), evaluate=False)
-a + b - c + d

It's also worth nothing that the fix suggested by @smichr produces a slightly different representation for the given sample expression:

# On version '1.11.1' + fix provided by @smichr.
>>> Add(Add(-a, b, evaluate=False), Add(-c, d, evaluate=False), evaluate=False)
(-a + b) + (-c + d)

However, the said fix doesn't provide the cleanest representation for expressions like -a - b - c:

# On version '1.10.1'.
>>> Add(Add(-a, -b, evaluate=False), -c, evaluate=False)
-c - a - b

# On version 1.11.1 + fix provided by @smichr.
>>> Add(Add(-a, -b, evaluate=False), -c, evaluate=False)
-c + (-a - b)

At this point I feel like someone who truly understands sympy/printing/str.py needs to step in to fix the printing because it is very unreliable (especially after discovering yet another bug, which I highlighted in issue #24108 -- referenced by @asmeurer above).

Oct 04 '22 16:10 costika1234

doesn't provide the cleanest representation for expressions like -a - b - c

The question is what you actually expect from the printing here. Add(Add(-a, -b), -c) is not the same as Add(-a, -b, -c). Why would you use evaluate=False to prevent the Add from flattening if you really wanted it to display like -a - b - c?

Oct 04 '22 16:10 oscarbenjamin

I mean I was merely pointing out the difference between 1.10.1 and 1.11.1 + the suggested fix. My expectation is to have a reliable string representation, which at the minute doesn't quite exist. (I am saying this mostly because expressions involving multiplications and negations are fairly broken, as highlighted in #24108.)

In my case, I'm relying a lot on the LaTeX module to parse expressions and in that part of the library evaluate=False is used pretty much everywhere. This is why in all my examples I'm using that flag set to False.

Oct 04 '22 17:10 costika1234

The thing here is that SymPy tries to print expressions in a "nice way", so unevaluated printing will almost certainly fail in some cases. Just as people will be upset that their expression doesn't print "nicely" for others.

I think the best way is to introduce some (global?) flag that just skips all the fancy sorting and "nice" printing and prints brute force. There are quite a few issues like this where unevaluated printing breaks. Maybe even implement an "UnevaluatedStringPrinter" overriding the most problematic methods (Add and Mul). However, one should note that there is no such thing as Sub and Div in SymPy, so a - b will always be a + (-1)*b, so it is fundamentally impossible to distinguish between Add(a, Mul(-1, b, evaluate=False), evaluate=False) and Add(a, -b, evaluate=False).

Oct 05 '22 11:10 oscargus

While I agree with the fact that it won't always be perfect there is a difference between "sometime it won't be pretty" and "it not the correct mathematical expression".

The printer is used for code generation for example so wether or not it's always pretty it needs to always be correct.

Oct 05 '22 11:10 mloubout

it needs to always be correct

Indeed. I'm just saying that many of the ~errors~ fixes introduced recently are to make unevaluated stuff print "better unevaluated" at the expense of correct and nice. I'm doubting that one can have both with the same printer.

Oct 05 '22 15:10 oscargus

I think that the real problem is not so much with the printers but just that SymPy's expression types were not designed to be used in the way that is typically intended by someone who uses evaluate=False. Essentially pretty much any use of unevaluated expressions goes against the basic design of SymPy. If I was starting things from scratch then there would be a different kind of symbolic expression for the purposes of parsing and printing and it would have things like Div and Sub etc and it wouldn't mess around with the order of arguments to an Add or Mul and so on.

That being said the fact that the printers do mess around with the order of arguments has always annoyed me. They should really just print the expression as is (in so far as that is possible).

Oct 05 '22 15:10 oscarbenjamin

That being said the fact that the printers do mess around with the order of arguments has always annoyed me. They should really just print the expression as is (in so far as that is possible).

It's also a huge performance problem. Printing a large expression can be very slow unless sorting is disabled with order='none'. It's even worse when it tries to sort using numerical evaluation, which can sometimes just hang the printer (https://github.com/sympy/sympy/issues/10800). I think we should change printer defaults to order='none', and completely removing any sort of evalf from any printer code. It really doesn't matter if pi + E prints as E + pi because $e < \pi$. I don't think anyone even expects that in the first place.

The only reason to do sorting in the printers is to print polynomials in a nice lexicographic order. It's not really necessary to try to extend this to every possible non-polynomial expression. Also, currently Add already sorts its arguments, so I don't know if the printer sorting is even necessary for that use case. There's also the sorting to print symbols in alphabetical order in a monomial, which also should already happen from the Mul constructor sorting. And either way, these could both be done much more efficiently with a much more restricted sort key.

Oct 05 '22 22:10 asmeurer

I think we should change printer defaults to order='none'

I agree.

Oct 05 '22 22:10 oscarbenjamin

In my case, I'm relying a lot on the LaTeX module to parse expressions and in that part of the library evaluate=False is used pretty much everywhere. This is why in all my examples I'm using that flag set to False.

Ideally it wouldn't do this by default https://github.com/sympy/sympy/issues/24116. That's immaterial to this issue, though, because unevaluated expressions should print correctly regardless of where they came from.

If you don't actually want an unevaluated expression, for now, you can use the rebuild function from sympy.strategies to force it to evaluate, which will solve these printing woes:

>>> from sympy.strategies.rl import rebuild
>>> expr = parse_latex(r'1 + 1')
>>> expr
1 + 1
>>> rebuild(expr)
2
>>> parse_latex('\\frac {-x + \\sqrt{y}}{2}') # https://github.com/sympy/sympy/issues/24108
-x + sqrt(y)/2
>>> rebuild(parse_latex('\\frac {-x + \\sqrt{y}}{2}'))
-x/2 + sqrt(y)/2

Oct 05 '22 22:10 asmeurer

Well, the reason the printers are quite complicated is that we would like to get "nice" expressions, which typically breaks for unevaluated cases. Just look at _print_Add or _print_Mul. It is not '+'.join(self._print(a) for a in expr.args)...

Oct 06 '22 07:10 oscargus

sympy sympy copied to clipboard

`_print_Add` is fundamentally wrong

sympy
sympy copied to clipboard