sympy icon indicating copy to clipboard operation
sympy copied to clipboard

let decompose_power handle base reduction

Open smichr opened this issue 2 years ago • 14 comments

References to other Issues or PRs

#5149

Brief description of what is fixed or changed

By decomposing a Rational base into a perfect power, cases where two generators were formerly identified by Poly will now give one:

>>> Poly(4**x+2**x)
Poly((2**x)**2 + (2**x), 2**x, domain='ZZ')  <-- formerly two generators 2**x and 4**x

Other comments

While Poly calls decompose_power_rat, poly does not, so the latter needed editing, too.

It was nice to see the continuous random variable test simplify from

2*531441**(-k)*sqrt(k)*theta*(3*3**(12*k) - 2*531441**k) /(sqrt(k**2 + 18)*sqrt(theta**2 + 18))

to

2*sqrt(k)*theta/(sqrt(k**2 + 18)*sqrt(theta**2 + 18))

When the 531441**(-k)*(3*3**(12*k) - 2*531441**k) is recognized as 1

Release Notes

  • core
    • exprtools.decompose_power identifies perfect power in Rational bases so decompose_power(4**x)->(2**x, 2)
  • polys
    • Poly and poly should both make powers with Rational bases canonical, e.g. 2**(2*x) instead of 4**x

smichr avatar Aug 15 '22 23:08 smichr

:white_check_mark:

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • core

    • exprtools.decompose_power identifies perfect power in Rational bases so decompose_power(4**x)->(2**x, 2) (#23936 by @smichr)
  • polys

    • Poly and poly should both make powers with Rational bases canonical, e.g. 2**(2*x) instead of 4**x (#23936 by @smichr)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->

#5149 #19776

#### Brief description of what is fixed or changed
By decomposing a Rational base into a perfect power, cases where two generators were formerly identified by Poly will now give one:

```python
>>> Poly(4**x+2**x)
Poly((2**x)**2 + (2**x), 2**x, domain='ZZ')  <-- formerly two generators 2**x and 4**x
```
#### Other comments

While `Poly` calls `decompose_power_rat`, `poly` does not, so the latter needed editing, too.

It was nice to see the continuous random variable test simplify from
```python
2*531441**(-k)*sqrt(k)*theta*(3*3**(12*k) - 2*531441**k) /(sqrt(k**2 + 18)*sqrt(theta**2 + 18))
```
to
```python
2*sqrt(k)*theta/(sqrt(k**2 + 18)*sqrt(theta**2 + 18))
```
When the `531441**(-k)*(3*3**(12*k) - 2*531441**k)` is recognized as 1
#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* core
  * `exprtools.decompose_power` identifies perfect power in Rational bases so `decompose_power(4**x)->(2**x, 2)`
* polys
  * `Poly` and `poly` should both make powers with Rational bases canonical, e.g. `2**(2*x)` instead of `4**x`
<!-- END RELEASE NOTES -->

sympy-bot avatar Aug 15 '22 23:08 sympy-bot

failing: File "/simplify/tests/test_hyperexpand.py", line 687, in test_prudnikov_misc: assert can_do([a], [a - S.Half, 2*a + 1])

smichr avatar Aug 18 '22 12:08 smichr

Any objections here?

smichr avatar Aug 19 '22 16:08 smichr

I need to think about this a bit but it seems good. It also extends the capability of factor:

In [1]: factor(4**x - 1) # master
Out[1]: 
 x    
4  - 1

In [3]: factor(4**x - 1) # PR
Out[3]: 
⎛ x    ⎞ ⎛ x    ⎞
⎝2  - 1⎠⋅⎝2  + 1⎠

This I'm less sure about:

In [3]: factor(4**x + 1)
Out[3]: 
 2⋅x    
2    + 1

Maybe that's okay since factor generally puts expressions into a particular canonical form (e.g. expands all powers etc).

Presumably other polynomial functions like gcd, cancel, quo etc are affected:

In [1]: gcd(2**x - 1, 4**x - 1) # master
Out[1]: 1

In [12]: gcd(2**x - 1, 4**x - 1) # PR
Out[12]: 
 x    
2  - 1

oscarbenjamin avatar Aug 19 '22 21:08 oscarbenjamin

I think this is the "sweet spot" to put this handling. It's a very targeted transformation and, as you note, any routine that uses this to put expressions in a generator-canonical form will benefit. Please commit when ready.

smichr avatar Aug 19 '22 22:08 smichr

I think there should be more exploration of how this changes things and more tests for the different functions that would be affected.

It would probably be good to go further than this and handle things like 2**(x/2). I don't propose to make that change here (the same problem exists for sqrt(x)) but since the code will likely be changed in future it's important that there are tests for any functionality that this enables.

Another place affected is construct_domain:

In [8]: construct_domain([4**x, 2**x]) # master
Out[8]: (EX, [EX(4**x), EX(2**x)])

In [1]: construct_domain([4**x, 2**x]) # PR
Out[1]: (ZZ[2**x], [(2**x)**2, (2**x)])

oscarbenjamin avatar Aug 19 '22 22:08 oscarbenjamin

handle things like 2**(x/2)

please elaborate. Poly(4**(x/2)) should become Poly(2**x, 2**x)

smichr avatar Aug 19 '22 22:08 smichr

handle things like 2**(x/2)

please elaborate. Poly(4**(x/2)) should become Poly(2**x, 2**x)

More like this:

In [4]: Poly(2**x + 4**x)
Out[4]: Poly((2**x)**2 + (2**x), 2**x, domain='ZZ')

In [5]: Poly(2**x + 2**(x/2))
Out[5]: Poly((2**(x/2)) + (2**x), 2**(x/2), 2**x, domain='ZZ')

In [6]: Poly(x + x**2)
Out[6]: Poly(x**2 + x, x, domain='ZZ')

In [7]: Poly(sqrt(x) + x)
Out[7]: Poly(x + (sqrt(x)), x, sqrt(x), domain='ZZ')

This leads to:

In [9]: e = (sqrt(x) - 1)*(sqrt(x) - 2)

In [10]: e
Out[10]: (√x - 2)⋅(√x - 1)

In [11]: expand(e)
Out[11]: -3⋅√x + x + 2

In [12]: factor(_)
Out[12]: -3⋅√x + x + 2

Similarly with this PR:

In [13]: e = (2**x - 1)*(2**x - 2)

In [14]: expand(e)
Out[14]: 
 2⋅x      x    
2    - 3⋅2  + 2

In [15]: factor(_)
Out[15]: 
⎛ x    ⎞ ⎛ x    ⎞
⎝2  - 2⎠⋅⎝2  - 1⎠

In [16]: e = (2**(x/2) - 1)*(2**(x/2) - 2)

In [17]: expand(e)
Out[17]: 
     x         
     ─         
     2    x    
- 3⋅2  + 2  + 2

In [18]: factor(_)
Out[18]: 
     x         
     ─         
     2    x    
- 3⋅2  + 2  + 2

I haven't thought through in detail what's the best way to handle this. As I said though it's probably best not to try to include that in this PR. Here all that's needed is good test coverage for the effects of the change already made.

oscarbenjamin avatar Aug 19 '22 23:08 oscarbenjamin

I just looked at where this function is called. It's very low level:

  • decompose_power
    • as_terms
      • as_ordered_terms
        • traversal
        • epathtools
        • printer
        • formal
        • manual_integrate
        • sort_key
    • _parallel_dict_from_expr_*_gens
      • *dict_from_expr
        • minpoly
        • Monomial

Searching in the repo via github, in the 9 pages of results I can't find where Monomial is ever called...hmm. I'm not sure about writing tests for anything higher than Poly -- I can't imagine that we will soon stop using Poly for low level manipulation of expression. I can try follow your example and just write some for the expression-wrangling functions that we have. But all they will be reflecting is that Poly tracks its generator(s) -- so factoring 2**x + 4**x is no different than factoring y + y**2.

smichr avatar Aug 19 '22 23:08 smichr

More like this:

Oh, ok. I'm familiar with that. In solve we look for generators that are powers of others. That is something that can only be determined in context of all generators. There is no way that decompose_power knows in what context its generator is being used. The benefits to Poly from this PR are a consequence of having the Rational base powers have a canonical base. I can look into this for a future PR.

smichr avatar Aug 19 '22 23:08 smichr

Anything that changes how Poly chooses its generators or how construct_domain decides on a computable ring will affect all high-level functions that use Poly for their internal work. A change like that needs a lot of tests.

A shortlist of functions to consider: Poly, poly, as_poly, div, rem, quo, exquo, invert, gcd, primitive, content, sqf, factor, cancel, construct_domain, apart.

oscarbenjamin avatar Aug 20 '22 00:08 oscarbenjamin

A change like that needs a lot of tests.

I'm going to write the tests, but my point is that there is no difference to Poly-related functions whether the generators are x, sin(x), or 2**x. So targetting Poly, poly (which was caught be a test already) and as_poly seem to be the sufficient targets for tests unless any of those other functions you listed uses a method other than these 3 to construct the primitives with which the work is done. I will look into that.

smichr avatar Aug 20 '22 00:08 smichr

breadcrumb: this should pass but doesn't -- more work needs to be done with poly

def test_rational_base_generator():
    for a, b in [(2**x, 4**x), (x, x**2)]:
        p = Poly(a + b)
        assert poly(a + b) == p
        assert (a + b).as_poly() == p
        assert p.gens == (a,)
        # give gen a
        assert Poly(a + b, a) == p
        assert poly(a + b, a) == p
        assert (a + b).as_poly(a) == p
        # give gen b
        raises(PolynomialError, lambda: Poly(a + b, b))
        raises(PolynomialError, lambda: poly(a + b, b))
        assert (a + b).as_poly(b) is None

smichr avatar Aug 20 '22 15:08 smichr

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1 means a speed up and greater than 1 means a slowdown. Green lines beginning with + are slowdowns (the PR is slower then master or master is slower than the previous release). Red lines beginning with - are speedups.

Significantly changed benchmark results (PR vs master)


Significantly changed benchmark results (master vs previous release)

       before           after         ratio
     [41d90958]       [a380cfaa]
     <sympy-1.11.1^0>                 
-         968±2μs        621±0.9μs     0.64  solve.TimeSparseSystem.time_linear_eq_to_matrix(10)
-     2.78±0.01ms         1.17±0ms     0.42  solve.TimeSparseSystem.time_linear_eq_to_matrix(20)
-     5.52±0.01ms         1.72±0ms     0.31  solve.TimeSparseSystem.time_linear_eq_to_matrix(30)

Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).

github-actions[bot] avatar Sep 06 '22 15:09 github-actions[bot]

I am not aware of anything more that is needed for this. The test suite does not give much feedback with this change -- only one test was changed for the better. Unless someone can show that this is broken, I will commit and let the larger community's use-cases drive further development.

smichr avatar Jan 05 '23 20:01 smichr

This fails:

In [10]: ZZ[2**x].from_sympy(4**x)
---------------------------------------------------------------------------
CoercionFailed: expected an integer, got 4**x
...
ValueError: expected an expression convertible to a polynomial in Polynomial ring in 2**x over ZZ with lex order, got 4**x

This breaks a key invariant which is that the domain returned by construct_domain should always be able to convert the expressions it was constructed for:

In [20]: vals = [2**x, 4**x]

In [21]: dom, domvals = construct_domain(vals)

In [22]: assert domvals == [dom.from_sympy(v) for v in vals]
---------------------------------------------------------------------------
CoercionFailed 

I'm also unsure about turning 4**x into 2**(2*x) everywhere:

In [27]: cancel(4**x)
Out[27]: 
 2⋅x
2 

Maybe that's a good thing but I'm not sure.

oscarbenjamin avatar Jan 05 '23 22:01 oscarbenjamin

This breaks a key invariant

I am guessing that I will have to dig into from_sympy to fix this.

smichr avatar Jan 05 '23 23:01 smichr

I'm also unsure about turning 4**x into 2**(2*x) everywhere:

Possibly having to_sympy/as_expr automatically convert this would help. I'm not sure whether 4**x or 2**(2*x) should be preferred in general. It seems unfortunate though to change this if there isn't any reason to do so.

oscarbenjamin avatar Jan 05 '23 23:01 oscarbenjamin

if there isn't any reason to do so.

The motivation to do so is to make Rational-based powers behave the same as symbolic bases:

>>> [len(Poly((x+x**2).simplify()).gens) for x in (sin(x), 2**x)]  # simplify to make (2**x)**2 -> 4**x
[1, 2]

This has implications for (as you noted) factoring; it also affects solving equations and removes the onus of looking for these redundant generators from solve to Poly.

Maybe this would be better as a type of simplification: tbd(2**x + 4**x) -> 2**x + 2**(2*x)?

smichr avatar Jan 06 '23 01:01 smichr

The motivation to do so is to make Rational-based powers behave the same as symbolic bases:

If that's the goal then it's also possible to go further and factorise other things e.g.

In [1]: 2**x*3**x
Out[1]: 
 x
6 

In [2]: factor(2**x*3**x - 2**x)
Out[2]: 
   x    x
- 2  + 6 

There's definitely a question of how far we should go here because fully factorising integers can be an expensive operation. Another possibility is that this could be a Poly option or at least that it could be possible to specify these as generators:

In [3]: factor(2**x*3**x - 2**x, [2**x, 3**x])
Out[3]: 
   x    x
- 2  + 6 

In [4]: Poly(2**x*3**x - 2**x, [2**x, 3**x])
Out[4]: Poly(-(2**x) + 6**x, 2**x, 3**x, domain='ZZ[6**x]')

oscarbenjamin avatar Jan 06 '23 11:01 oscarbenjamin

Yes, the factorization bothers me, too. Perhaps the better thing to do is attempt a unification of generators that are powers in Poly. Finding gcd is easy compared to factorization.

smichr avatar Jan 06 '23 19:01 smichr

Maybe this is similar to this problem: https://github.com/sympy/sympy/issues/22852#issuecomment-1012640723

oscarbenjamin avatar Jan 06 '23 20:01 oscarbenjamin