openfisca-core Refactor pytest fixtures

Technical changes

Refactor the fixtures so that:

Fixtures can be shared across tests more easily
By making fixtures vs importing it is possible to avoid errors due to mutability of the objects
Improve readability of the tests
Tests can be parallelized more easly and so speed up on testings time could come if xdist pytest plugin is used

Jan 19 '21 22:01 cesco-fran

Thank you @cesco-fran! It's great to have a conftest.py file and to uniformize / share the fixtures. It makes the tests easier to read from one file to another 👍 . Besides, if I understand it correctly, pytest runs the conftest.py while collecting the tests so it should reduce the tests duration. But we seem to increase it. 🤔

What did I do to check tests duration?

I used time -p command to compare pytest exécution between master branch and this PR (after cherry-picking its commits). To measure the average time of multiple calls I used the avg_time_alt function described here and here is what I found:

master branch:

$ avg_time_alt 100 pytest
real: 11.62690
user: 10.53990
sys : 1.10350

user + sys = 11.6434 seconds

refactor-pytest-fixtures commits:

$ avg_time_alt 100 pytest
real: 12.45240
user: 11.44300
sys : 1.19660

user + sys = 12.6396 seconds

Which increases the time by more than 8% ( 100*(0.9935 /11.6434) where 0.9935 = 12.6396 - 11.6434 ). As the tests duration is already a pain point I would start by looking for the part that costs us this increase. 🤔

Feb 04 '21 11:02 sandcha

Thanks you @sandcha for your review and feedbacks!.... I think the speed from the old code come at expense of reliability ... some objects were imported and used throughout different tests ... making the tests less predictable and so prone to bugs ... I think once we agree the new refactor the PR propose does not introduce any inconsistency/bugs ... the way are fixtures layout will allow to work on optimization ...via scope or parallelization.... I think the point is that if we want an object live the time of all test sessions we need to be explicit about that via scope ... considering the following tradeoff: more the test is isolated more would be slow (since we have to rebuild from scratch fixture) but more will be reliable...

Feb 04 '21 13:02 cesco-fran

Hello ! I agree with both points :

Isolation is not ideal, which prevents further refactoring - mutability of objects.
8% is a bit expensive in terms of performance.

My guess is that we're losing performance with the most expensive operations like the time benefit system.

Mar 07 '21 23:03 bonjourmauko

Slower tests for master:

# pytest --durations=20

0.73s call     tests/core/test_yaml.py::test_shell_script
0.70s call     tests/core/test_yaml.py::test_shell_script_with_reform
0.69s call     tests/core/test_yaml.py::test_shell_script_with_extension
0.52s call     tests/core/test_yaml.py::test_failing_shell_script
0.16s call     tests/core/test_yaml.py::test_run_tests_from_directory_fail
0.10s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_dynamic_variable
0.09s call     tests/core/test_yaml.py::test_name_filter
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_has_car_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_new_tax_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_basic_income_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_parameter
0.08s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_variable
0.08s call     tests/core/test_yaml.py::test_success
0.07s call     tests/core/test_yaml.py::test_run_tests_from_directory
0.07s call     tests/core/test_yaml.py::test_with_reform
0.07s call     tests/core/test_yaml.py::test_with_extension
0.07s call     tests/core/test_yaml.py::test_fail
0.07s call     tests/core/test_yaml.py::test_relative_error_margin_fail
0.07s call     tests/core/test_yaml.py::test_with_anchors
0.07s call     tests/core/test_yaml.py::test_absolute_error_margin_fail

Slower tests for refactor-pytest-fixtures:

# pytest --durations=20

0.72s call     tests/core/test_yaml.py::test_shell_script
0.72s call     tests/core/test_yaml.py::test_shell_script_with_reform
0.68s call     tests/core/test_yaml.py::test_shell_script_with_extension
0.52s call     tests/core/test_yaml.py::test_failing_shell_script
0.15s call     tests/core/test_yaml.py::test_name_filter
0.13s call     tests/core/test_yaml.py::test_run_tests_from_directory_fail
0.10s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_dynamic_variable
0.10s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_parameter
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_has_car_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_new_tax_variable
0.09s setup    tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_basic_income_variable
0.09s call     tests/core/test_yaml.py::test_fail
0.09s call     tests/core/test_yaml.py::test_relative_error_margin_success
0.09s call     tests/core/test_yaml.py::test_success
0.08s call     tests/core/test_yaml.py::test_with_reform
0.08s call     tests/core/test_yaml.py::test_with_extension
0.08s call     tests/core/test_yaml.py::test_run_tests_from_directory
0.08s call     tests/core/test_yaml.py::test_absolute_error_margin_fail
0.08s call     tests/core/test_yaml.py::test_relative_error_margin_fail

There's a general overhead.

Mar 07 '21 23:03 bonjourmauko

Profile for master:

# pytest --profile

         4807454 function calls (4663716 primitive calls) in 7.150 seconds

   Ordered by: cumulative time
   List reduced from 3724 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  565/508    0.007    0.000    7.149    0.014 runner.py:83(pytest_runtest_protocol)
9662/5588    0.056    0.000    6.975    0.001 hooks.py:272(__call__)
11450/5588    0.009    0.000    6.928    0.001 manager.py:90(_hookexec)
11450/5588    0.024    0.000    6.923    0.001 manager.py:84(<lambda>)
11450/5588    0.134    0.000    6.908    0.001 callers.py:157(_multicall)
  565/508    0.007    0.000    6.869    0.014 runner.py:90(runtestprotocol)
1695/1524    0.013    0.000    6.859    0.005 runner.py:183(call_and_report)
1695/1524    0.009    0.000    6.252    0.004 runner.py:204(call_runtest_hook)
1747/1524    0.010    0.000    6.228    0.004 runner.py:237(from_call)
1695/1524    0.005    0.000    6.215    0.004 runner.py:217(<lambda>)
  565/508    0.003    0.000    4.661    0.009 runner.py:126(pytest_runtest_call)
      508    0.002    0.000    4.649    0.009 python.py:1475(runtest)
      508    0.005    0.000    4.607    0.009 python.py:175(pytest_pyfunc_call)
        4    0.000    0.000    2.556    0.639 subprocess.py:326(check_call)
        4    0.001    0.000    2.555    0.639 subprocess.py:309(call)
        8    0.000    0.000    2.541    0.318 subprocess.py:979(wait)
        8    0.000    0.000    2.541    0.318 subprocess.py:1586(_wait)
        4    0.000    0.000    2.540    0.635 subprocess.py:1573(_try_wait)
        4    2.540    0.635    2.540    0.635 {built-in method posix.waitpid}
       12    0.000    0.000    1.388    0.116 test_yaml.py:21(run_yaml_test)

Profile for refactor-pytest-fixtures:

# pytest --profile

         6185028 function calls (5985560 primitive calls) in 10.037 seconds

   Ordered by: cumulative time
   List reduced from 3754 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  530/473    0.008    0.000   10.036    0.021 runner.py:83(pytest_runtest_protocol)
10049/5203    0.063    0.000    9.841    0.002 hooks.py:272(__call__)
11873/5203    0.010    0.000    9.793    0.002 manager.py:90(_hookexec)
11873/5203    0.027    0.000    9.788    0.002 manager.py:84(<lambda>)
11873/5203    0.149    0.000    9.773    0.002 callers.py:157(_multicall)
  530/473    0.008    0.000    9.754    0.021 runner.py:90(runtestprotocol)
1590/1419    0.015    0.000    9.744    0.007 runner.py:183(call_and_report)
1590/1419    0.011    0.000    9.001    0.006 runner.py:204(call_runtest_hook)
1642/1419    0.010    0.000    8.971    0.006 runner.py:237(from_call)
1590/1419    0.005    0.000    8.958    0.006 runner.py:217(<lambda>)
  530/473    0.006    0.000    5.349    0.011 runner.py:126(pytest_runtest_call)
      473    0.002    0.000    5.333    0.011 python.py:1475(runtest)
      473    0.005    0.000    5.288    0.011 python.py:175(pytest_pyfunc_call)
      530    0.002    0.000    3.077    0.006 runner.py:121(pytest_runtest_setup)
      530    0.004    0.000    3.060    0.006 runner.py:359(prepare)
      473    0.002    0.000    3.046    0.006 python.py:1479(setup)
      473    0.001    0.000    3.043    0.006 fixtures.py:278(fillfixtures)
      473    0.002    0.000    3.042    0.006 fixtures.py:472(_fillfixtures)
      659    0.001    0.000    3.037    0.005 fixtures.py:479(getfixturevalue)
 1412/659    0.004    0.000    3.036    0.005 fixtures.py:489(_get_active_fixturedef)

Number of function calls increased by 28% = 100 * (6185028 - 4807454) / 4807454

Mar 07 '21 23:03 bonjourmauko

Collection is actually cheaper:

[master]# pytest --collect-only

Collected 508 items in 1.45s

vs

[refactor-pytest-fixtures]# pytest --collect-only

Collected 473 items in 1.40s

Mar 08 '21 00:03 bonjourmauko

As your stats confirm the slowness seam to come from the number of calls... that as you mention make the difference when heavy object like tbs have to rebuild more often... I think this slowness is a price to pay if we want make tests more reliable and improve on developer time vs testings time ... once the logic is clear we could start to see what improvement we can make by playing with fixture scoping...

Mar 08 '21 08:03 cesco-fran

I think the drop in performance to be an acceptable payoff if we can move to clearer, møre independent, and atomic tests. This only impacts but core tests which are still reasonably fast IMHO. I guess we can recover this performance penalty by transforming unnecessary integration tests into unit ones ?

De: cesco-fran [email protected] Enviado: Monday, March 8, 2021 9:13:25 AM Para: openfisca/openfisca-core [email protected] Cc: Mauko Quiroga [email protected]; Review requested [email protected] Asunto: Re: [openfisca/openfisca-core] Refactor pytest fixtures - Part 1 (#969)

As your stats confirm the problem is in the number of calls... and as you mention this make things slower when heavy object like tbs have to rebuild more often... I think this slowness is a price to pay if we want make tests more reliable and improve on developer time vs testings time ... once the logic is clear we could start to see what improvement we can make by playing with fixture scoping...

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHubhttps://github.com/openfisca/openfisca-core/pull/969#issuecomment-792563797, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACQMFDXZ6OE3KVC7M67F43TCSBKLANCNFSM4WJURP5Q.

Mar 08 '21 10:03 bonjourmauko

Yes, go to "clearer, more independent, and atomic tests" is exactly one the aim of the PR .. to cope with slowness I agree with other strategies should be consider ... to understand what are the tests that are more critical and what not as you suggested is one important one. And the less critical do not need to be deleted but just marked so that are run just in some special occasion.

Mar 08 '21 10:03 cesco-fran

After taking a look closely, the more expensive tests are CLI's —openfisca test— and run_tests. The former could be better tested if done at a more high-level like with click instead of using subprocess, which could make sense if the CLI's wasn't ours. run_tests looks trickier.

I'm personally OK with this PR as it'll make refactoring easier IMHO.

Any thought on this @sandcha ?

Mar 09 '21 16:03 bonjourmauko

While I was reviewing #984 I realize could make sense have some tbs object with different scope ... In my PR I make tbs functionally scoped so that since tbs are mutable that make more safe ... #984 make the opposite choice and make it globally scoped ... so I thought a solution could be we have different tbs each explicitly refereeing with is scope .. so that when people use it they are aware of what kind of tbs they are dealing with, and if is global that do not mess with it ... this way we should be able to improve performance while maintaining testing reliable and easy to extend and understand.

Apr 01 '21 07:04 cesco-fran

so I thought a solution could be we have different tbs each explicitly refereeing with is scope .. so that when people use it they are aware of what kind of tbs they are dealing with, and if is global that do not mess with it

Thanks @cesco-fran - I agree this is a safe option which is easy to understand and extend. I am happy to follow this convention in #984 too. :smile: :+1:

Apr 01 '21 10:04 RamParameswaran

Sounds great @cesco-fran :)

Apr 01 '21 20:04 bonjourmauko