sympy icon indicating copy to clipboard operation
sympy copied to clipboard

Make printing of FiniteSet with sub FiniteSets consistent.

Open gutow opened this issue 2 years ago • 33 comments

References to other Issues or PRs

None

Brief description of what is fixed or changed

Printing behavior of FiniteSets appears to be inconsistent. Current behavior (Notice that FiniteSets that do not contain other sets are not wrapped in FiniteSet()):

>>> a, b, c, d = symbols('a b c d')
>>> expr1 = a/(c+d)
>>> expr2 = (a+b)/c -d
>>> expr3 = a/d + b/c
>>> tst1 = FiniteSet(expr1, 1.0)
>>> tst1
{1.0, a/(c + d)}
>>> tst2 = FiniteSet(expr2, expr3, tst1)
>>> tst2
FiniteSet(-d + (a + b)/c, a/d + b/c, {1.0, a/(c + d)})

So a FiniteSet is only specified in the output if it wraps other FiniteSets. Otherwise they are printed to look like standard Python sets. This means that copy & paste of the output for tst1 does not generate a valid sympy expression, but copy and paste of the output for tst2 does.

Behavior implemented by this pull request:

>>> a, b, c, d = symbols('a b c d')
>>> expr1 = a/(c+d)
>>> expr2 = (a+b)/c -d
>>> expr3 = a/d + b/c
>>> tst1 = FiniteSet(expr1, 1.0)
>>> tst1
FiniteSet(1.0, a/(c + d))
>>> tst2 = FiniteSet(expr2, expr3, tst1)
>>> tst2
FiniteSet(-d + (a + b)/c, a/d + b/c, {1.0, a/(c + d)})

After this copy & paste of the output string for tst1 and tst2 regenerates equivalent sympy expressions.

Other comments

Release Notes

  • printing
    • Changed printing of FiniteSet in str.py so that the exterior instance of FiniteSet always prints as FiniteSet(...) rather than some as Python sets (e.g. {...}).
    • Changed printing of Sets in stry.py to maintain {...} as the representation of FinteSets within Sets.
    • Changed test_str.py to check for consistent printing of FiniteSets.
    • Doctests changed many places to account for exterior instances of FiniteSet being printed as FiniteSet(...) rather than python sets.

gutow avatar Jan 22 '23 19:01 gutow

:white_check_mark:

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • printing
    • Changed printing of FiniteSet in str.py so that the exterior instance of FiniteSet always prints as FiniteSet(...) rather than some as Python sets (e.g. {...}). (#24574 by @gutow)

    • Changed printing of Sets in stry.py to maintain {...} as the representation of FinteSets within Sets. (#24574 by @gutow)

    • Changed test_str.py to check for consistent printing of FiniteSets. (#24574 by @gutow)

    • Doctests changed many places to account for exterior instances of FiniteSet being printed as FiniteSet(...) rather than python sets. (#24574 by @gutow)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
None

#### Brief description of what is fixed or changed
Printing behavior of FiniteSets appears to be inconsistent.
Current behavior (Notice that FiniteSets that do not contain other sets are not wrapped in `FiniteSet()`): 
```
>>> a, b, c, d = symbols('a b c d')
>>> expr1 = a/(c+d)
>>> expr2 = (a+b)/c -d
>>> expr3 = a/d + b/c
>>> tst1 = FiniteSet(expr1, 1.0)
>>> tst1
{1.0, a/(c + d)}
>>> tst2 = FiniteSet(expr2, expr3, tst1)
>>> tst2
FiniteSet(-d + (a + b)/c, a/d + b/c, {1.0, a/(c + d)})
```
So a FiniteSet is only specified in the output if it wraps other FiniteSets. Otherwise they are printed to look like standard
Python sets. This means that copy & paste of the output for tst1 does not generate a valid sympy expression, but copy and paste of the output for tst2 does.

Behavior implemented by this pull request:
```
>>> a, b, c, d = symbols('a b c d')
>>> expr1 = a/(c+d)
>>> expr2 = (a+b)/c -d
>>> expr3 = a/d + b/c
>>> tst1 = FiniteSet(expr1, 1.0)
>>> tst1
FiniteSet(1.0, a/(c + d))
>>> tst2 = FiniteSet(expr2, expr3, tst1)
>>> tst2
FiniteSet(-d + (a + b)/c, a/d + b/c, {1.0, a/(c + d)})
```
After this copy & paste of the output string for tst1 and tst2 regenerates equivalent sympy expressions.

#### Other comments

#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:


-->
<!-- BEGIN RELEASE NOTES -->
* printing
  * Changed printing of FiniteSet in str.py so that the exterior instance of FiniteSet always prints as `FiniteSet(...)` rather than some as Python sets (e.g. `{...}`).
  * Changed printing of Sets in stry.py to maintain `{...}` as the representation of FinteSets within Sets.
  * Changed test_str.py to check for consistent printing of FiniteSets.
  * Doctests changed many places to account for exterior instances of FiniteSet being printed as `FiniteSet(...)` rather than python sets. 

<!-- END RELEASE NOTES -->

sympy-bot avatar Jan 22 '23 19:01 sympy-bot

Doctests showing up in quite a few packages that depend on this inconsistent behavior. These can be fixed as well. However it appears this minor correction for consistency may impact a lot of things. Will this require a deprecation?

gutow avatar Jan 22 '23 19:01 gutow

It is difficult to do deprecation of these sort of things because it affects default options. I don't think that sympy had followed repr to be evaluate to same object, especially from things like

>>> x**Rational(1, 3)
x**(1/3)
>>> x**(1/3)
x**0.333333333333333

which affects copy-paste behavior.

sylee957 avatar Jan 22 '23 19:01 sylee957

@sylee957 Yep, sympy does not follow the python convention of repr being code equivalent and str being human readable. Most of the time str is the code representation. In this particular case the string representation is not consistent when FiniteSets are nested. I suspect it was originally to make it easier to read, but leads to problems. In addition to the cut&paste issue, a key one is that pretty printing does not automatically work on nested Finite sets. That is how I found the inconsistency.

I have not seen failures in the core tests, only the doctests. I am working my way through the doctest failures now. So far I have only found issues that impact what the output looks like. I am hoping to be lucky and not find anywhere that an internal sympy operation expects the python set instead of a FiniteSet. If that is the case I think this little change can be implemented by just fixing all the doctests.

gutow avatar Jan 22 '23 20:01 gutow

I am going to see if I can at least get the pretty printing to work better by changing repr.py instead.

gutow avatar Jan 22 '23 20:01 gutow

Modifying repr.py is not useful. Only used if a call is made to srepr(). And it already gives a good complete representation. The problem appears to be that sympy collapses __str__ and __repr__.

gutow avatar Jan 22 '23 21:01 gutow

Below verified:

I have not seen failures in the core tests, only the doctests. I am working my way through the doctest failures now. So far I have only found issues that impact what the output looks like. I am hoping to be lucky and not find anywhere that an internal sympy operation expects the python set instead of a FiniteSet. If that is the case I think this little change can be implemented by just fixing all the doctests.

No simpler solution found. I will fix the doctests to complete this pull request.

NOTE: It really does appear that somebody at some point did not want FiniteSet explicit in nested sets. Should we poll the community?

gutow avatar Jan 22 '23 21:01 gutow

@oscargus You were the last person to commit changes to sympy.printing.str._print_FiniteSet. Could I please get your thoughts on this pull request? I am running into problems with python code that uses sympy because sympy provides the same strings for __repr__() and __str__(). As noted, above what is returned is not consistent and code compatible. I am interested in this for use cases in Jupyter where I want to show the user simultaneously the default pretty printed result and the code compatible version of the result.

gutow avatar Jan 22 '23 22:01 gutow

Copy-printing needs to be understood in the context of sympify so e.g. Integer(1) prints as 1 which will be a Python int but sympify(1) gives Integer(1) which is the SymPy symbolic expression. The same thing happens when sympifying a set:

In [6]: a = FiniteSet(1, 2)

In [7]: a
Out[7]: {1, 2}

In [8]: srepr(a)
Out[8]: 'FiniteSet(Integer(1), Integer(2))'

In [9]: sympify({1, 2})
Out[9]: {1, 2}

In [10]: srepr(sympify({1, 2}))
Out[10]: 'FiniteSet(Integer(1), Integer(2))'

oscarbenjamin avatar Jan 22 '23 22:01 oscarbenjamin

I think the reasoning here was that Python's set cannot contain other sets, so nested sets have to be printed in the more verbose way. But I don't see the benefit of not using {} notation otherwise. It's much simpler, and it will be automatically converted to a FiniteSet whenever it is combined with a SymPy object.

asmeurer avatar Jan 23 '23 04:01 asmeurer

I think that frozenset which should be actual FiniteSet that python is using, is already represented verbose though.

sylee957 avatar Jan 23 '23 06:01 sylee957

But again, it is hard to decide default options because we can't let someone other comes in the future and let their own preference merged in

sylee957 avatar Jan 23 '23 06:01 sylee957

So, I understand the logic to be that wrapping a set of FiniteSets represented by {...} in FiniteSet(...) for display is simply a way of making the whole thing a sympy object so anything inside it will be sympified. That seems to be a solution that generally works.

However, as noted above the current implementation outputs a simple FiniteSet as a sympy set represented by {...}. This would have to be manually sympified to behave as the object it represents (See tst1 in the description of the pull request).

A solution that would maintain most of the current behavior would be to display a FiniteSet as FiniteSet{...} always for the outer FiniteSet, but use the compact {...} and definitely easier to read notation for FiniteSets within a FiniteSet. Would this be an acceptable smaller change?

gutow avatar Jan 24 '23 02:01 gutow

A solution that would maintain most of the current behavior would be to display a FiniteSet as FiniteSet{...} always for the outer FiniteSet, but use the compact {...} and definitely easier to read notation for FiniteSets within a FiniteSet. Would this be an acceptable smaller change?

Isn't this what it already does? It uses {} unless it contains a set in which case it uses FiniteSet.

asmeurer avatar Jan 24 '23 08:01 asmeurer

A solution that would maintain most of the current behavior would be to display a FiniteSet as FiniteSet(...) always for the outer FiniteSet, but use the compact {...} and definitely easier to read notation for FiniteSets within a FiniteSet. Would this be an acceptable smaller change?

Isn't this what it already does? It uses {} unless it contains a set in which case it uses FiniteSet.

No, the following is the behavior I am encountering:

>>> a, b, c, d = symbols('a b c d')
>>> expr1 = a/(c+d)
>>> expr2 = (a+b)/c -d
>>> expr3 = a/d + b/c
>>> tst1 = FiniteSet(expr1, 1.0)
>>> tst1
{1.0, a/(c + d)}

gutow avatar Jan 24 '23 21:01 gutow

No, the following is the behavior I am encountering:

I'm not following. This is a set that doesn't contain any sets, so it is printed with {}.

It would help if you could show the current behavior and your suggested behavior. Your examples are a little hard to follow because they contain a lot of extraneous information that doesn't seem to be relevant. This is the current behavior,

>>> FiniteSet(1, 2) # Doesn't contain a set
{1, 2}
>>> FiniteSet(1, FiniteSet(1, 2)) # Outer set contains a set. Inner set does not
FiniteSet(1, {1, 2})
>>> FiniteSet(1, FiniteSet(1, 2), FiniteSet(1, FiniteSet(1, 2))) # Any set that directly contains a FiniteSet is printed with 'FiniteSet' and {} otherwise
FiniteSet(1, {1, 2}, FiniteSet(1, {1, 2}))

This seems to me to be reasonable. It prints with the terser {} when it can, and the more verbose FiniteSet when and only when it has to.

Just to be clear, the problem is that Python's {} set syntax cannot recursively contain itself, since it creates a mutable set object which cannot itself be contained in a set:

>>> {1, {2, 3}}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'

asmeurer avatar Jan 24 '23 23:01 asmeurer

I am objecting to the fact that FiniteSet(...) no mater what the contents are is not equivalent programmatically to {...}. If I copy & paste the output of the first example above, I do not get back the type FiniteSet.

>>> FiniteSet(1,2)
{1, 2}
>>> type({1, 2})
<class 'set'>
>>> type(FiniteSet(1,2))
<class 'sympy.sets.sets.FiniteSet'>

In contrast, the output of a FiniteSet that contains others can be copy & pasted to yield an object of the same type without the added step of sympifying.

I think the behavior for both cases should be to wrap the object with FiniteSet on the outside. I believe that is more consistent with how sympy handles other cases. Keeping the more compact notation works consistently in the second case because once you have a sympy object everything it contains gets sympified by default. The behavior I propose is:

>>> FiniteSet(1,2)
FiniteSet(1, 2)
>>> FiniteSet(1, FiniteSet(1, 2))
FiniteSet(1, {1, 2})

I do not like the less compact notation of FiniteSet(...), but I think that because there are some behavior differences it is better to clearly distinguish between python sets and sympy FiniteSets.

gutow avatar Jan 25 '23 02:01 gutow

This is how it works with any built-in type that gets converted to a SymPy type:

>>> type(1)
<class 'int'>
>>> type(1.0)
<class 'float'>
>>> print(Integer(1))
1
>>> print(Float(1.0))
1.00000000000000

SymPy even prints rationals in a way that doesn't copy-paste unless you pass them as a string to sympify:

>>> Rational(1, 3)
1/3
>>> 1/3
0.3333333333333333

sets don't actually suffer from this problem (*), in the sense that you don't even have to convert it to a string first before passing it to sympify (or more likely, just using it in an operation with another SymPy object). Plus the basic operators on set already work the same as on FiniteSet. You're unlikely to run into issues with them unless you go out of your way to do something that can't work on a FiniteSet. The only issue is that it doesn't contain any of the methods of Basic, but you can always normalize anything to a Basic subtype by calling sympify on it.

(*) there is an exception which we might want to special-case (see https://github.com/sympy/sympy/pull/20033 and related issues):

>>> FiniteSet(1, 1.0)
{1.0, 1}
>>> {1, 1.0}
{1}

asmeurer avatar Jan 25 '23 04:01 asmeurer

@asmeurer IMHO all of the examples you provide are issues for interactive users. Either __str__(expr) should work for copy & paste or it should mostly not, which would follow Python standards. If __str__(expr) generally does not work for copy & paste then sympy should follow the Python standard and have __repr__() be different from __str__() and produce valid code representations. My understanding of the sympy documentation is that __str__() is expected to produce valid sympy expressions. And most of the time it does. So, I think the community should consider fixing cases where it does not.

An alternative, would be to make changes to the documentation and the code as suggested below:

  • documentation change to: "__str__() will mostly produce valid sympy code, but with MANY exceptions to enhance readability." It appears that actually reflects the current situation.
  • change code so that __repr__() returns valid code. This is most easily achieved by redirecting it to srepr(). However, that generates excessively verbose code with things such as + unnecessarily replaced with add().

This is my opinion as someone who uses sympy interactively and as a teaching tool, with the goal of facilitating mathematics for my students. Tools with little 'gotchas' like this end up causing problems that can rapidly outweigh the advantages for students. One place where this becomes a serious issue is when copying & pasting the preferred solution from a set of solutions produced by solve() or solveset(). If one selects the solutions programmatically things work fine, but most students prefer to copy & paste because it is generally faster for them.

gutow avatar Jan 25 '23 14:01 gutow

There should be some way to override the printer by init_printing options if it needs some workaround I also have similar experience though, even when I want to check out some issues which involves huge expression in printed form

I'm not sure if there is something between sstr and sstrrepr for verbosity. I'm also not very happy with 'sympify' parsing because I like declaring symbols explicitly outside anyway, and with sympify I'm not sure if the symbols inside the expression are declared somewhere, while copy-pasting lets the python interpreter lets me correct these with errors. though it can be more general problem than here.

sylee957 avatar Jan 26 '23 12:01 sylee957

The problem with repr is that Python uses it in a lot of places. For example, it's used on list elements in str(list). It's also used as the default output in the interactive interpreter. This means that we can't really set __repr__ = srepr because it would be way too verbose for these common use-cases. TBH the places where Python uses str vs. the places where it uses repr don't really match the supposed "human readable" vs. "machine readable" distinction.

The output of str should be copy-pastable (if it isn't it's a bug), except:

  • Sometimes it results in a non-SymPy type (like str(Integer(1)) == "1").
  • Symbol names are assumed to already be defined. It also assumes symbol names are valid Python variables.

The issue here is about the first item, that FiniteSet prints something that gives a set instead of a FiniteSet. Can you give an example of how this produces issues with your students? Are they copy-pasting the set and then trying to call some sympy.Set method on it?

asmeurer avatar Jan 31 '23 17:01 asmeurer

Are they copy-pasting the set and then trying to call some sympy.Set method on it?

That is one example of the kinds of issues seen. Here's another one that only shows up in GUI interactive environments like Jupyter. I see inconsistent pretty-printing. In the first image (1) below copying and pasting the str() of a FiniteSet that does not contain other FiniteSets does not yield a pretty-printed result. While the second case (2) when the FiniteSet contains other FiniteSets copy and pasting the str() yields correct pretty printing.

(1) image (2) image

This is a simplified example of what happens. The usual way we have encountered this is when FiniteSets are the result of something like solveset() or another operation. It also becomes an issue when copying sets from within results.

I am not going to try to address the issue of copying a FiniteSet from within a Finite set in this issue. I am only going to wrap external FiniteSets in FiniteSet(...). So these two cases behave consistently. I am close to done with the test cases, changes to the printing that does this and making sure that FiniteSets inside other sets are still displayed as {...}. I hope to remove the draft status of this pull-request before the end of the week.

gutow avatar Feb 01 '23 01:02 gutow

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1 means a speed up and greater than 1 means a slowdown. Green lines beginning with + are slowdowns (the PR is slower then master or master is slower than the previous release). Red lines beginning with - are speedups.

Significantly changed benchmark results (PR vs master)


Significantly changed benchmark results (master vs previous release)


Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).

github-actions[bot] avatar Feb 04 '23 03:02 github-actions[bot]

@sylee957 @asmeurer @oscarbenjamin, if you are willing to consider this attempt to make FiniteSet printing more consistent, it is ready for review. Regards, Jonathan

gutow avatar Feb 08 '23 01:02 gutow

The checks seem to have hung in this pull-request. Is there a way to relaunch them?

gutow avatar May 18 '23 01:05 gutow

@oscarbenjamin Thank you. I did not realize that I could just close and reopen. In retrospect that should have been obvious.

Do we need to open a pull request to fix this error as it does not involve any authors associated with the current pull request?

Run bin/mailmap_check.py
This author is in the AUTHORS file but not .mailmap:
Sayan Mitra <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Aman Kumar Shukla <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Zoufiné Lauer-Baré <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Charles Harris <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Tejaswini Sanapathi <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Devansh <[email protected]>
This author is in the AUTHORS file but not .mailmap:
Aaron Gokaslan <[email protected]>

gutow avatar May 18 '23 14:05 gutow

Do we need to open a pull request to fix this error as it does not involve any authors associated with the current pull request?

This happens because you have an old PR that predates when the AUTHORS file was last updated. I think either merging or rebasing with current master will fix the problem.

More generally perhaps the check should allow for there to be unreocgnised names in the AUTHORS file. Usually though if there are unrecognised names then it is because someone has incorrectly edited the AUTHORS file rather than the .mailmap file.

It is perhaps also possible to improve this: https://github.com/sympy/sympy/blob/fde616dae12f576885711d736039afe776540488/bin/mailmap_check.py#L235-L243

oscarbenjamin avatar May 18 '23 15:05 oscarbenjamin

I'm still not really convinced on this. The printing thing is an unfortunate limitation of the Jupyter printing, but it's easy to fix by calling init_printing().

asmeurer avatar May 18 '23 18:05 asmeurer

This is definitely a request I made based on how I prefer things to work, because it makes the software easier to use for novices.

I am advocating for this change so that the behavior of outputs of FiniteSet is consistent in interactive use. The current behavior sometimes produces code that can be copy-pasted and sometime not. It sometimes produces pretty printed output and sometimes not. I believe this change guarantees that the output can by copy-pasted and will reliably pretty print. I do not see why users should have to manually tweak things for that to occur consistently. I am sure there are many other places where this is also an issue. I believe they should be addressed as they are found. As this only impacts interactive outputs, I do not think it introduces any compatibility issues.

This change makes no difference for code embedding Sympy as coders should be using the objects directly.

Based on this my opinion is that the change should be made. That said, if others find the change problematic then it should be avoided. My read of the discussion so far is more along the lines of "that's just the way it is." If there is not something this change would break, I request that it be made.

Thanks.

gutow avatar May 18 '23 18:05 gutow

Why is the release note check failing?

sylee957 avatar May 18 '23 19:05 sylee957