sympy icon indicating copy to clipboard operation
sympy copied to clipboard

Improve sympy startup time by pre-generating assumption rules

Open eendebakpt opened this issue 3 years ago • 7 comments

References to other Issues or PRs

Fixes #23848

Brief description of what is fixed or changed

The assumption rules in sympy.core.assumptions have been pre-generated and stored. This improves the import time from

Note: the first run (warm up) was not included in the average + std dev
All runs (including warm up):
[0.2799325890000546, 0.2792433899999196, 0.2808860589998403, 0.27852462000009837, 0.2772021640003004, 0.27793970099992293, 0.27890890300022875, 0.27670108499978596, 0.2830468789998122, 0.2824971650002226, 0.27993182499994873, 0.27861428000005617, 0.2989063120003266, 0.2775266109997574, 0.29364538099980564, 0.2771660369999154, 0.286309306000021, 0.2790996429998813, 0.2776118500000848, 0.2799282989999483, 0.28268679099983274, 0.286931328999799, 0.27870470200014097, 0.2795189109997409, 0.28090279800017015, 0.2771274270003232, 0.2784352640001089, 0.2790888449999329, 0.28215103899992755, 0.28110537000020486, 0.2792655839998588, 0.28015478599991184, 0.2742751440000575, 0.277146034999987, 0.2830514439997387, 0.30372588199998063, 0.278697158000341, 0.2799442360001194, 0.2820860720003111, 0.28028372000017043, 0.27603550999992876, 0.27661651399967013, 0.27680658300005234, 0.27522678399964207, 0.2784867990003477, 0.2751314780002758, 0.2808772350003892, 0.2758764670002165, 0.2807688479997523, 0.2850936009999714, 0.2838939059997756]
Number of tests: 50
The speed of "import sympy" is: 0.280676 +- 0.005417

to

Note: the first run (warm up) was not included in the average + std dev
All runs (including warm up):
[0.26176198099983594, 0.2641488809999828, 0.25052918699975635, 0.2563463709998359, 0.25801725200017245, 0.25252887999977247, 0.24990126400007284, 0.249705110999912, 0.2548714300000938, 0.25649742299992795, 0.24887773199998264, 0.2522605530002693, 0.2585813169998801, 0.2490264909997677, 0.2559804580000673, 0.25061231300014697, 0.2539327670001512, 0.25147581700002775, 0.24917899499996565, 0.25249531100007516, 0.25062299900037033, 0.25119203399981416, 0.25984018200006176, 0.25059449900027175, 0.2599466599999687, 0.24821165699995618, 0.25074421799990887, 0.2574191149997205, 0.2514301539999906, 0.2553798280000592, 0.2530995380002423, 0.24977733599962448, 0.24670270900014657, 0.2564127300001928, 0.26333457200007615, 0.24589979699976539, 0.2507734949999758, 0.2475807599998916, 0.2506884849999551, 0.24823069700005362, 0.2492625009999756, 0.25011981900024693, 0.2483615269998154, 0.25432030199999645, 0.24687275099995531, 0.2516041240000959, 0.255909456000154, 0.25308567299998685, 0.24905589000036343, 0.24761925300026633, 0.25222413200026494]
Number of tests: 50
The speed of "import sympy" is: 0.252426 +- 0.004169

Other comments

A test has been added to compare the pre-generated assumptions with freshly generated assumptions.

This assumes the assumptions are system independent, this has to be checked.

Release Notes

  • core
    • Import startup time by using pre-generated assumptions.

eendebakpt avatar Jul 28 '22 18:07 eendebakpt

:white_check_mark:

Hi, I am the SymPy bot (v167). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • core

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.12.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->

Fixes #23848

#### Brief description of what is fixed or changed

The assumption rules in `sympy.core.assumptions` have been pre-generated and stored. This improves the import time from
```
Note: the first run (warm up) was not included in the average + std dev
All runs (including warm up):
[0.2799325890000546, 0.2792433899999196, 0.2808860589998403, 0.27852462000009837, 0.2772021640003004, 0.27793970099992293, 0.27890890300022875, 0.27670108499978596, 0.2830468789998122, 0.2824971650002226, 0.27993182499994873, 0.27861428000005617, 0.2989063120003266, 0.2775266109997574, 0.29364538099980564, 0.2771660369999154, 0.286309306000021, 0.2790996429998813, 0.2776118500000848, 0.2799282989999483, 0.28268679099983274, 0.286931328999799, 0.27870470200014097, 0.2795189109997409, 0.28090279800017015, 0.2771274270003232, 0.2784352640001089, 0.2790888449999329, 0.28215103899992755, 0.28110537000020486, 0.2792655839998588, 0.28015478599991184, 0.2742751440000575, 0.277146034999987, 0.2830514439997387, 0.30372588199998063, 0.278697158000341, 0.2799442360001194, 0.2820860720003111, 0.28028372000017043, 0.27603550999992876, 0.27661651399967013, 0.27680658300005234, 0.27522678399964207, 0.2784867990003477, 0.2751314780002758, 0.2808772350003892, 0.2758764670002165, 0.2807688479997523, 0.2850936009999714, 0.2838939059997756]
Number of tests: 50
The speed of "import sympy" is: 0.280676 +- 0.005417
```
to
```
Note: the first run (warm up) was not included in the average + std dev
All runs (including warm up):
[0.26176198099983594, 0.2641488809999828, 0.25052918699975635, 0.2563463709998359, 0.25801725200017245, 0.25252887999977247, 0.24990126400007284, 0.249705110999912, 0.2548714300000938, 0.25649742299992795, 0.24887773199998264, 0.2522605530002693, 0.2585813169998801, 0.2490264909997677, 0.2559804580000673, 0.25061231300014697, 0.2539327670001512, 0.25147581700002775, 0.24917899499996565, 0.25249531100007516, 0.25062299900037033, 0.25119203399981416, 0.25984018200006176, 0.25059449900027175, 0.2599466599999687, 0.24821165699995618, 0.25074421799990887, 0.2574191149997205, 0.2514301539999906, 0.2553798280000592, 0.2530995380002423, 0.24977733599962448, 0.24670270900014657, 0.2564127300001928, 0.26333457200007615, 0.24589979699976539, 0.2507734949999758, 0.2475807599998916, 0.2506884849999551, 0.24823069700005362, 0.2492625009999756, 0.25011981900024693, 0.2483615269998154, 0.25432030199999645, 0.24687275099995531, 0.2516041240000959, 0.255909456000154, 0.25308567299998685, 0.24905589000036343, 0.24761925300026633, 0.25222413200026494]
Number of tests: 50
The speed of "import sympy" is: 0.252426 +- 0.004169
```

#### Other comments

A test has been added to compare the pre-generated assumptions with freshly generated assumptions.

This assumes the assumptions are **system independent**, this has to be checked.

#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* core
   * Import startup time by using pre-generated assumptions.
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

sympy-bot avatar Jul 28 '22 18:07 sympy-bot

Maybe we can merge this with similar code for the new assumptions (see bin/ask_generated.py).

asmeurer avatar Jul 29 '22 00:07 asmeurer

🟠

Hi, I am the SymPy bot (v167). I've noticed that some of your commits add or delete files. Since this is sometimes done unintentionally, I wanted to alert you about it.

This is an experimental feature of SymPy Bot. If you have any feedback on it, please comment at https://github.com/sympy/sympy-bot/issues/75.

The following commits add new files:

  • d2299c490fe484b83ebd75731996c784c670f816:
    • sympy/core/assumptions_generated.py

If these files were added/deleted on purpose, you can ignore this message.

sympy-bot avatar Jul 29 '22 19:07 sympy-bot

@asmeurer I merged the generation with bin/ask_update.py.

It turns out that the generated rules are not system independent. The logic is the same, but the ordering of the rules is different. This makes a check on whether the pre-generated rules are equal to generated rules complex. (in particular the _assume_rules.beta_triggers will be hard to compare).

I tracked down the reason for this, it turns out that Logic.fromstring('complex & !algebraic') can result in And(complex, Not(algebraic)) or in And(Not(algebraic), complex), depending on the system. The difference can be traced back to args = sorted(set(cls.flatten(bargs)), key=hash) in the class AndOr_Base (I guess the hash of the And is system dependent).

I added some sorting to the Prover to resolve this, but I am not sure this is the best approach.

eendebakpt avatar Jul 29 '22 20:07 eendebakpt

Sorting sounds good. I'm assuming that's done when we pregenerate the rules and not at runtime.

asmeurer avatar Jul 29 '22 21:07 asmeurer

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1 means a speed up and greater than 1 means a slowdown. Green lines beginning with + are slowdowns (the PR is slower then master or master is slower than the previous release). Red lines beginning with - are speedups.

Significantly changed benchmark results (PR vs master)


Significantly changed benchmark results (master vs previous release)

       before           after         ratio
     [26f7bdbe]       [829cc6c8]
     <sympy-1.11^0>                 
-         956±3μs        621±0.5μs     0.65  solve.TimeSparseSystem.time_linear_eq_to_matrix(10)
-     2.78±0.01ms         1.16±0ms     0.42  solve.TimeSparseSystem.time_linear_eq_to_matrix(20)
-     5.56±0.03ms         1.71±0ms     0.31  solve.TimeSparseSystem.time_linear_eq_to_matrix(30)

Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).

github-actions[bot] avatar Jul 29 '22 21:07 github-actions[bot]

Sorting sounds good. I'm assuming that's done when we pregenerate the rules and not at runtime.

Yes, only in pre-generation. The sorting is applied to the order of arguments in And and Or.

eendebakpt avatar Jul 30 '22 12:07 eendebakpt

I was thinking something like this:

from collections import defaultdict


beta_rules = _pre_calculated_assumptions['beta_rules']
defined_facts = _pre_calculated_assumptions['defined_facts']
full_implications = _pre_calculated_assumptions['full_implications']
prereq = _pre_calculated_assumptions['prereq']
beta_triggers = _pre_calculated_assumptions['beta_triggers']


def defined_facts_lines():
    yield f'defined_facts = ['
    for fact in defined_facts:
        yield f'    {fact!r},'
    yield f'] # defined_facts'


def full_implications_lines():
    full_implications_dict = defaultdict(dict)
    for (fact, value), implications in full_implications.items():
        full_implications_dict[fact][value] = implications

    yield 'full_implications = {'
    for fact in defined_facts:
        implications = full_implications_dict[fact]
        for value in (True, False):
            yield f'    # Implications of {fact} = {value}:'
            yield f'    (({fact!r}, {value!r}), ('
            for implied in sorted(implications.get(value, ())):
                yield f'        {implied!r},'
            yield f'        ),'
            yield f''
    yield '} # full_implications'


def prereq_lines():
    yield 'prereq = {'
    yield ''
    for fact in sorted(prereq):
        yield f'    # facts that could determine the value of {fact}'
        yield f'    {fact!r}: {{'
        for pfact in sorted(prereq[fact]):
            yield f'        {pfact!r},'
        yield '    },'
        yield ''
    yield '} # prereq'


def beta_rules_lines():
    reverse_implications = defaultdict(list)
    for pre, implied in beta_rules:
        reverse_implications[implied].append(pre)

    yield 'beta_rules = ['
    yield ''
    for implied in sorted(reverse_implications):
        fact, value = implied
        yield f'    # Rules implying {fact} = {value}'
        for pre in reverse_implications[implied]:
            ruleno = beta_rules.index((pre, implied))
            setstr = ", ".join(map(str, sorted(pre)))
            yield f'    ({{{setstr}}},'
            yield f'        {implied!r}), # rule number {ruleno}'
        yield ''
    yield '] # beta_rules'


def beta_triggers_lines():
    yield 'beta_triggers = {'
    for query in sorted(beta_triggers):
        fact, value = query
        yield f'    {query!r}: {beta_triggers[query]!r},'
    yield '} # beta_triggers'


def all_lines():
    yield from defined_facts_lines()
    yield ''
    yield ''
    yield from full_implications_lines()
    yield ''
    yield ''
    yield from prereq_lines()
    yield ''
    yield ''
    yield from beta_rules_lines()
    yield ''
    yield ''
    yield from beta_triggers_lines()


def print_rules():
    for line in all_lines():
        print(line)


print_rules()

I just realised that this messes up the ordering of beta_rules though so I guess beta_triggers needs to be renumbered or something.

oscarbenjamin avatar Aug 30 '22 14:08 oscarbenjamin

I was thinking something like this:

from collections import defaultdict

...
...

def print_rules():
    for line in all_lines():
        print(line)


print_rules()

I just realised that this messes up the ordering of beta_rules though so I guess beta_triggers needs to be renumbered or something.

Ok, the example above not only changes the format but in addition adds some informative comments. The indices in beta_triggers are related to the ordering of beta_rules so changing that is a bit more work. Is there are change that changing the ordering of the rules will change behavior or performance of sympy? If so, then I would like to keep the original ordering and leave the ordering of the rules to another PR.

@oscarbenjamin

eendebakpt avatar Sep 08 '22 13:09 eendebakpt

Is there are change that changing the ordering of the rules will change behavior or performance of sympy?

It shouldn't affect performance. It only matters for behaviour that the ordering is internally consistent and I think we'd find out pretty quickly if it wasn't.

I haven't tested this but I think it should work:

def beta_rules_lines():
    reverse_implications = defaultdict(list)
    for n, (pre, implied) in enumerate(beta_rules):
        reverse_implications[implied].append((pre, n))

    yield 'beta_rules = ['
    yield ''
    m = 0
    indices = {}
    for implied in sorted(reverse_implications):
        fact, value = implied
        yield f'    # Rules implying {fact} = {value}'
        for pre, n in reverse_implications[implied]:
            indices[n] = m
            m += 1
            ruleno = beta_rules.index((pre, implied))
            setstr = ", ".join(map(str, sorted(pre)))
            yield f'    ({{{setstr}}},'
            yield f'        {implied!r}), # rule number {ruleno}'
        yield ''
    yield '] # beta_rules'

    yield 'beta_triggers = {'
    for query in sorted(beta_triggers):
        fact, value = query
        triggers = [indices[n] for n in beta_triggers[query]]
        yield f'    {query!r}: {triggers!r},'
    yield '} # beta_triggers'

oscarbenjamin avatar Sep 08 '22 14:09 oscarbenjamin

Looks good. Thanks

oscarbenjamin avatar Sep 13 '22 23:09 oscarbenjamin