Refactor pytest fixtures - Part 1
Technical changes
Refactor the fixtures so that:
- Fixtures can be shared across tests more easily
- By making fixtures vs importing it is possible to avoid errors due to mutability of the objects
- Improve readability of the tests
- Tests can be parallelized more easly and so speed up on testings time could come if xdist pytest plugin is used
Thank you @cesco-fran! It's great to have a conftest.py file and to uniformize / share the fixtures. It makes the tests easier to read from one file to another 👍 . Besides, if I understand it correctly, pytest runs the conftest.py while collecting the tests so it should reduce the tests duration. But we seem to increase it. 🤔
What did I do to check tests duration?
I used time -p command to compare pytest exécution between master branch and this PR (after cherry-picking its commits). To measure the average time of multiple calls I used the avg_time_alt function described here and here is what I found:
master branch:
$ avg_time_alt 100 pytest
real: 11.62690
user: 10.53990
sys : 1.10350
user + sys = 11.6434 seconds
refactor-pytest-fixtures commits:
$ avg_time_alt 100 pytest
real: 12.45240
user: 11.44300
sys : 1.19660
user + sys = 12.6396 seconds
Which increases the time by more than 8% ( 100*(0.9935 /11.6434) where 0.9935 = 12.6396 - 11.6434 ). As the tests duration is already a pain point I would start by looking for the part that costs us this increase. 🤔
Thanks you @sandcha for your review and feedbacks!.... I think the speed from the old code come at expense of reliability ... some objects were imported and used throughout different tests ... making the tests less predictable and so prone to bugs ... I think once we agree the new refactor the PR propose does not introduce any inconsistency/bugs ... the way are fixtures layout will allow to work on optimization ...via scope or parallelization.... I think the point is that if we want an object live the time of all test sessions we need to be explicit about that via scope ... considering the following tradeoff: more the test is isolated more would be slow (since we have to rebuild from scratch fixture) but more will be reliable...
Hello ! I agree with both points :
- Isolation is not ideal, which prevents further refactoring - mutability of objects.
- 8% is a bit expensive in terms of performance.
My guess is that we're losing performance with the most expensive operations like the time benefit system.
Slower tests for master:
# pytest --durations=20
0.73s call tests/core/test_yaml.py::test_shell_script
0.70s call tests/core/test_yaml.py::test_shell_script_with_reform
0.69s call tests/core/test_yaml.py::test_shell_script_with_extension
0.52s call tests/core/test_yaml.py::test_failing_shell_script
0.16s call tests/core/test_yaml.py::test_run_tests_from_directory_fail
0.10s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_dynamic_variable
0.09s call tests/core/test_yaml.py::test_name_filter
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_has_car_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_new_tax_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_basic_income_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_parameter
0.08s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_variable
0.08s call tests/core/test_yaml.py::test_success
0.07s call tests/core/test_yaml.py::test_run_tests_from_directory
0.07s call tests/core/test_yaml.py::test_with_reform
0.07s call tests/core/test_yaml.py::test_with_extension
0.07s call tests/core/test_yaml.py::test_fail
0.07s call tests/core/test_yaml.py::test_relative_error_margin_fail
0.07s call tests/core/test_yaml.py::test_with_anchors
0.07s call tests/core/test_yaml.py::test_absolute_error_margin_fail
Slower tests for refactor-pytest-fixtures:
# pytest --durations=20
0.72s call tests/core/test_yaml.py::test_shell_script
0.72s call tests/core/test_yaml.py::test_shell_script_with_reform
0.68s call tests/core/test_yaml.py::test_shell_script_with_extension
0.52s call tests/core/test_yaml.py::test_failing_shell_script
0.15s call tests/core/test_yaml.py::test_name_filter
0.13s call tests/core/test_yaml.py::test_run_tests_from_directory_fail
0.10s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_dynamic_variable
0.10s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_parameter
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_has_car_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_social_security_contribution_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_new_tax_variable
0.09s setup tests/web_api/case_with_reform/test_reforms.py::test_return_code_of_basic_income_variable
0.09s call tests/core/test_yaml.py::test_fail
0.09s call tests/core/test_yaml.py::test_relative_error_margin_success
0.09s call tests/core/test_yaml.py::test_success
0.08s call tests/core/test_yaml.py::test_with_reform
0.08s call tests/core/test_yaml.py::test_with_extension
0.08s call tests/core/test_yaml.py::test_run_tests_from_directory
0.08s call tests/core/test_yaml.py::test_absolute_error_margin_fail
0.08s call tests/core/test_yaml.py::test_relative_error_margin_fail
There's a general overhead.
Profile for master:
# pytest --profile
4807454 function calls (4663716 primitive calls) in 7.150 seconds
Ordered by: cumulative time
List reduced from 3724 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
565/508 0.007 0.000 7.149 0.014 runner.py:83(pytest_runtest_protocol)
9662/5588 0.056 0.000 6.975 0.001 hooks.py:272(__call__)
11450/5588 0.009 0.000 6.928 0.001 manager.py:90(_hookexec)
11450/5588 0.024 0.000 6.923 0.001 manager.py:84(<lambda>)
11450/5588 0.134 0.000 6.908 0.001 callers.py:157(_multicall)
565/508 0.007 0.000 6.869 0.014 runner.py:90(runtestprotocol)
1695/1524 0.013 0.000 6.859 0.005 runner.py:183(call_and_report)
1695/1524 0.009 0.000 6.252 0.004 runner.py:204(call_runtest_hook)
1747/1524 0.010 0.000 6.228 0.004 runner.py:237(from_call)
1695/1524 0.005 0.000 6.215 0.004 runner.py:217(<lambda>)
565/508 0.003 0.000 4.661 0.009 runner.py:126(pytest_runtest_call)
508 0.002 0.000 4.649 0.009 python.py:1475(runtest)
508 0.005 0.000 4.607 0.009 python.py:175(pytest_pyfunc_call)
4 0.000 0.000 2.556 0.639 subprocess.py:326(check_call)
4 0.001 0.000 2.555 0.639 subprocess.py:309(call)
8 0.000 0.000 2.541 0.318 subprocess.py:979(wait)
8 0.000 0.000 2.541 0.318 subprocess.py:1586(_wait)
4 0.000 0.000 2.540 0.635 subprocess.py:1573(_try_wait)
4 2.540 0.635 2.540 0.635 {built-in method posix.waitpid}
12 0.000 0.000 1.388 0.116 test_yaml.py:21(run_yaml_test)
Profile for refactor-pytest-fixtures:
# pytest --profile
6185028 function calls (5985560 primitive calls) in 10.037 seconds
Ordered by: cumulative time
List reduced from 3754 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
530/473 0.008 0.000 10.036 0.021 runner.py:83(pytest_runtest_protocol)
10049/5203 0.063 0.000 9.841 0.002 hooks.py:272(__call__)
11873/5203 0.010 0.000 9.793 0.002 manager.py:90(_hookexec)
11873/5203 0.027 0.000 9.788 0.002 manager.py:84(<lambda>)
11873/5203 0.149 0.000 9.773 0.002 callers.py:157(_multicall)
530/473 0.008 0.000 9.754 0.021 runner.py:90(runtestprotocol)
1590/1419 0.015 0.000 9.744 0.007 runner.py:183(call_and_report)
1590/1419 0.011 0.000 9.001 0.006 runner.py:204(call_runtest_hook)
1642/1419 0.010 0.000 8.971 0.006 runner.py:237(from_call)
1590/1419 0.005 0.000 8.958 0.006 runner.py:217(<lambda>)
530/473 0.006 0.000 5.349 0.011 runner.py:126(pytest_runtest_call)
473 0.002 0.000 5.333 0.011 python.py:1475(runtest)
473 0.005 0.000 5.288 0.011 python.py:175(pytest_pyfunc_call)
530 0.002 0.000 3.077 0.006 runner.py:121(pytest_runtest_setup)
530 0.004 0.000 3.060 0.006 runner.py:359(prepare)
473 0.002 0.000 3.046 0.006 python.py:1479(setup)
473 0.001 0.000 3.043 0.006 fixtures.py:278(fillfixtures)
473 0.002 0.000 3.042 0.006 fixtures.py:472(_fillfixtures)
659 0.001 0.000 3.037 0.005 fixtures.py:479(getfixturevalue)
1412/659 0.004 0.000 3.036 0.005 fixtures.py:489(_get_active_fixturedef)
Number of function calls increased by 28% = 100 * (6185028 - 4807454) / 4807454
Collection is actually cheaper:
[master]# pytest --collect-only
Collected 508 items in 1.45s
vs
[refactor-pytest-fixtures]# pytest --collect-only
Collected 473 items in 1.40s
As your stats confirm the slowness seam to come from the number of calls... that as you mention make the difference when heavy object like tbs have to rebuild more often... I think this slowness is a price to pay if we want make tests more reliable and improve on developer time vs testings time ... once the logic is clear we could start to see what improvement we can make by playing with fixture scoping...
I think the drop in performance to be an acceptable payoff if we can move to clearer, møre independent, and atomic tests. This only impacts but core tests which are still reasonably fast IMHO. I guess we can recover this performance penalty by transforming unnecessary integration tests into unit ones ?
De: cesco-fran [email protected] Enviado: Monday, March 8, 2021 9:13:25 AM Para: openfisca/openfisca-core [email protected] Cc: Mauko Quiroga [email protected]; Review requested [email protected] Asunto: Re: [openfisca/openfisca-core] Refactor pytest fixtures - Part 1 (#969)
As your stats confirm the problem is in the number of calls... and as you mention this make things slower when heavy object like tbs have to rebuild more often... I think this slowness is a price to pay if we want make tests more reliable and improve on developer time vs testings time ... once the logic is clear we could start to see what improvement we can make by playing with fixture scoping...
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHubhttps://github.com/openfisca/openfisca-core/pull/969#issuecomment-792563797, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACQMFDXZ6OE3KVC7M67F43TCSBKLANCNFSM4WJURP5Q.
Yes, go to "clearer, more independent, and atomic tests" is exactly one the aim of the PR .. to cope with slowness I agree with other strategies should be consider ... to understand what are the tests that are more critical and what not as you suggested is one important one. And the less critical do not need to be deleted but just marked so that are run just in some special occasion.
After taking a look closely, the more expensive tests are CLI's —openfisca test— and run_tests. The former could be better tested if done at a more high-level like with click instead of using subprocess, which could make sense if the CLI's wasn't ours. run_tests looks trickier.
I'm personally OK with this PR as it'll make refactoring easier IMHO.
Any thought on this @sandcha ?
While I was reviewing #984 I realize could make sense have some tbs object with different scope ... In my PR I make tbs functionally scoped so that since tbs are mutable that make more safe ... #984 make the opposite choice and make it globally scoped ... so I thought a solution could be we have different tbs each explicitly refereeing with is scope .. so that when people use it they are aware of what kind of tbs they are dealing with, and if is global that do not mess with it ... this way we should be able to improve performance while maintaining testing reliable and easy to extend and understand.
so I thought a solution could be we have different tbs each explicitly refereeing with is scope .. so that when people use it they are aware of what kind of tbs they are dealing with, and if is global that do not mess with it
Thanks @cesco-fran - I agree this is a safe option which is easy to understand and extend. I am happy to follow this convention in #984 too. :smile: :+1:
Sounds great @cesco-fran :)