give example of parametrizing on input size
I'm using pytest-benchmark for the first time to benchmark bidict: https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py
I'm wondering how using different input sizes affects my benchmarks (it would be cool to be able to generate a graph that shows that a particular function is e.g. quadratic with respect to input size, for example).
This seems like a common use case people might have when benchmarking their code, but I don't see any examples of how to do this in the README, so I'm wondering if I'm missing something. If not, would it be valuable to give an example or two? If I can figure out how to do this, I'd be happy to work up a PR adding an example to your docs if there is interest.
And if you happen to have any other benchmarking advice from looking at what I'm doing above, it'd be much appreciated.
Thanks!
On a quick glance, you'd parametrize on specific inputs.
Some examples:
- parametrize on backend: https://github.com/ionelmc/python-lazy-object-proxy/blob/master/tests/test_lazy_object_proxy.py#L1831 (results: https://travis-ci.org/ionelmc/python-lazy-object-proxy/jobs/93851093#L1508)
- parametrize on backend (but parametrized fixture): https://github.com/ionelmc/python-hunter/blob/master/tests/test_hunter.py#L840
- another example: https://bitbucket.org/antocuni/capnpy/src/97ec7c8fe435ee10d5224aea482ca6d0ec4195c8/capnpy/benchmarks/test_benchmarks.py?at=master&fileviewer=file-view-default
Thanks so much for those examples! Look forward to taking a closer look soon.
Now that I look more at your test code, what's the purpose of using the groups?
Another thing, your tests may be a bit tricky. Because running the benchmark will cause side-effects you would need to use "pedantic mode" to precisely test performance at some specific object size. Eg, after 10000 runs the setitem may perform way slower cause the object has way more items.
@pytest.fixture(params=[1000, 100000]) # this would be the intial size
def data(request, backend):
return {object(): object() for _ in range(request.param)}
@pytest.fixture(params=[dict, invdict], ids=lambda t: t.__name__)
def backend(request):
return request.param
from collections import deque
def test_setitem(benchmark, data, backend):
key, value = object(), object()
def runner(obj):
obj[key] = value
def setup():
return backend(data)
benchmark.pedantic(runner, setup=setup, iterations=100)
Also see http://hackebrot.github.io/pytest-tricks/create_tests_via_parametrization/
So what's the deal with the pedantic mode? See http://pytest-benchmark.readthedocs.org/en/v3.0.0/pedantic.html but note that it's dangerous so you would need to manually find the ideal number of iterations. If it's too low you will get high IQR/stddev.
I'm not really sure you need the pedantic mode, do you expect the class to perform differently depending on data size?
Wow, thanks for taking a closer look, this looks super helpful! I'm about to get pulled away to a conference for the next 10 days, but psyched to dig into this next chance I get. Really appreciate your fast and thorough responses!
Hi @ionelmc, I took a closer look and this was definitely helpful, thank you. Several improvements based on your tips in my latest version: https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py
There are still some unanswered questions (e.g. how to get separate benchmark groups per input size per test; in the meantime just using a single input size), which I've explained in the comments there. Please let me know if what I'm trying to do still isn't clear after reading that.
And here are some benchmark results from running these on my machine:
----------------------------------------------------------------------------- benchmark 'get_key_by_val': 2 tests -----------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-1024] 183.7601 (1.0) 2,651.3099 (1.0) 230.1725 (1.0) 74.6707 (1.0) 186.8601 (1.0) 59.9198 (1.0) 7426;7099 53778 100
test_get_key_by_val[bidict-1024] 482.7001 (2.63) 23,708.3004 (8.94) 614.5220 (2.67) 281.2807 (3.77) 497.3990 (2.66) 158.8996 (2.65) 26699;26715 189790 10
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'init': 2 tests -----------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-1024] 112.6600 (1.0) 431.6380 (1.0) 141.9397 (1.0) 40.1109 (1.0) 117.2015 (1.0) 37.8770 (1.0) 1036;879 7272 1
test_init[bidict-1024] 384.7930 (3.42) 11,095.5590 (25.71) 573.1356 (4.04) 755.2874 (18.83) 450.0470 (3.84) 143.3765 (3.79) 11;102 957 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------- benchmark 'setitem': 2 tests ---------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-1024] 1.7640 (1.0) 1.7640 (1.0) 1.7640 (1.0) 0.0000 (1.0) 1.7640 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-1024] 8.4050 (4.76) 8.4050 (4.76) 8.4050 (4.76) 0.0000 (1.0) 8.4050 (4.76) 0.0000 (1.0) 0;0 1 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And here are some results from running these on Travis (scroll to the bottom): https://travis-ci.org/jab/bidict/jobs/113642948
(Speaking of which, is there any way to see the output more clearly on Travis? Doesn't seem to be a way to make lines wider, and the wrapping of long lines makes it hard to read the benchmark results.)
Thanks again for all your help!
On Fri, Mar 4, 2016 at 2:26 PM, jab [email protected] wrote:
is there any way to see the output more clearly on Travis? Doesn't seem to be a way to make lines wider, and the wrapping of long lines makes it hard to read the benchmark results.
You could only show the more interesting columns, eg: --benchmark-columns=min,stddev
On Fri, Mar 4, 2016 at 2:26 PM, jab [email protected] wrote:
how to get separate benchmark groups per input size per test; in the meantime just using a single input size
In the master branch (I hope to release soon) there ability to group by specific parameter (eg: --benchmark-group-by=param:size)
Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats
You could only show the more interesting columns, eg: --benchmark-columns=min,stddev
When I try that I get py.test: error: unrecognized arguments: --benchmark-columns=min,stddev and I don't see it documented at http://pytest-benchmark.readthedocs.org/en/stable/usage.html#commandline-options -- is that a new option that hasn't yet made it into a release?
On Sat, Mar 5, 2016 at 4:04 PM, jab [email protected] wrote:
is that a new option that hasn't yet made it into a release?
Yes, it's in the master only.
Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats
I wrote something based on the example in the docs but it's never getting called:
File "/Library/Python/2.7/site-packages/pytest_benchmark/plugin.py", line 986, in pytest_benchmark_group_stats
raise NotImplementedError("Unsupported grouping %r." % group_by)
NotImplementedError: Unsupported grouping 'special'.
Is @pytest.mark.hookwrapper failing to register the pytest_benchmark_group_stats() I defined?
I'm running off latest release (3.0.0) not master.
Where did you define the hook?
On Saturday, 5 March 2016, jab [email protected] wrote:
Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats
I wrote something based on the example in the docs but it's never getting called:
File "/Library/Python/2.7/site-packages/pytest_benchmark/plugin.py", line 986, in pytest_benchmark_group_stats raise NotImplementedError("Unsupported grouping %r." % group_by) NotImplementedError: Unsupported grouping 'special'.
Is @pytest.mark.hookwrapper failing to register the pytest_benchmark_group_stats() I defined?
I'm running off latest release (3.0.0) not master.
— Reply to this email directly or view it on GitHub https://github.com/ionelmc/pytest-benchmark/issues/44#issuecomment-192659904 .
Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
In https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py right after the imports.
In case you didn't see this other question in https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py#L104, I'll paste here:
# TODO: iterations=100 causes: ValueError: Can't use more than 1 `iterations` with a `setup` function.
#benchmark.pedantic(setitem, setup=setup, iterations=100)
benchmark.pedantic(setitem, setup=setup)
It needs to be in a conftest.py file
On Saturday, 5 March 2016, jab [email protected] wrote:
In https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py right after the imports.
— Reply to this email directly or view it on GitHub https://github.com/ionelmc/pytest-benchmark/issues/44#issuecomment-192660832 .
Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
Thanks @ionelmc, that did the trick. Benchmarks are now properly getting grouped by (test, input size):
------------------------------------------------------------------------ benchmark 'test_get_key_by_val[11]': 2 tests -----------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-11] 169.2772 (1.0) 3,588.1996 (1.0) 229.0304 (1.0) 75.6794 (1.0) 188.3507 (1.0) 59.6046 (1.0) 7380;7296 40330 100
test_get_key_by_val[bidict-11] 432.1337 (2.55) 10,564.9233 (2.94) 544.4666 (2.38) 196.3595 (2.59) 491.7383 (2.61) 134.1105 (2.25) 9261;8621 76261 16
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------- benchmark 'test_get_key_by_val[110]': 2 tests -----------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-110] 169.2772 (1.0) 2,918.2434 (1.0) 214.3679 (1.0) 67.3869 (1.0) 181.1981 (1.0) 59.6046 (1.0) 5056;4692 38480 100
test_get_key_by_val[bidict-110] 441.0744 (2.61) 6,794.9295 (2.33) 531.4409 (2.48) 168.9434 (2.51) 452.9953 (2.50) 95.3674 (1.60) 11852;12138 99865 20
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------- benchmark 'test_get_key_by_val[767]': 2 tests -----------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-767] 178.8139 (1.0) 2,598.7625 (1.0) 211.7568 (1.0) 63.7221 (1.0) 181.1981 (1.0) 50.0679 (3.57) 4973;4321 39946 100
test_get_key_by_val[bidict-767] 406.7140 (2.27) 7,068.4096 (2.72) 518.4312 (2.45) 162.3665 (2.55) 476.8372 (2.63) 14.0246 (1.0) 9461;39030 99865 17
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------ benchmark 'test_get_key_by_val[5171]': 2 tests -----------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-5171] 169.2772 (1.0) 1,990.7951 (1.0) 209.9508 (1.0) 61.8881 (1.0) 181.1981 (1.0) 30.9944 (1.0) 6677;7031 55189 100
test_get_key_by_val[bidict-5171] 400.9767 (2.37) 24,817.2066 (12.47) 532.0872 (2.53) 204.3100 (3.30) 455.1627 (2.51) 130.0465 (4.20) 10657;10505 99865 22
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------- benchmark 'test_get_key_by_val[56902]': 2 tests -----------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-56902] 169.2772 (1.0) 2,539.1579 (1.0) 199.6674 (1.0) 50.8129 (1.0) 181.1981 (1.0) 11.9209 (1.0) 3622;7758 45591 100
test_get_key_by_val[bidict-56902] 429.1534 (2.54) 4,466.3747 (1.76) 544.7555 (2.73) 171.8878 (3.38) 468.8899 (2.59) 135.1039 (11.33) 7092;6542 52429 30
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------- benchmark 'test_init[11]': 2 tests -------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-11] 1.4901 (1.0) 25.4512 (1.0) 1.9713 (1.0) 0.6869 (1.0) 1.7285 (1.0) 0.4768 (1.0) 12968;12389 99865 4
test_init[bidict-11] 12.8746 (8.64) 665.9031 (26.16) 17.4738 (8.86) 14.2990 (20.82) 14.0667 (8.14) 3.0994 (6.50) 75;704 4199 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------- benchmark 'test_init[110]': 2 tests ----------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-110] 10.9673 (1.0) 115.1562 (1.0) 13.8944 (1.0) 4.4848 (1.0) 11.9209 (1.0) 3.0994 (1.0) 4436;4143 33289 1
test_init[bidict-110] 43.8690 (4.00) 12,042.0456 (104.57) 59.3539 (4.27) 134.3110 (29.95) 49.1142 (4.12) 13.8283 (4.46) 109;1037 8406 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------- benchmark 'test_init[767]': 2 tests -----------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-767] 71.7640 (1.0) 364.0652 (1.0) 81.0220 (1.0) 19.4860 (1.0) 72.9561 (1.0) 1.1921 (1.0) 871;1871 9259 1
test_init[bidict-767] 231.9813 (3.23) 3,443.0027 (9.46) 304.3127 (3.76) 184.6691 (9.48) 248.9090 (3.41) 75.3403 (63.20) 100;311 2619 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_init[5171]': 2 tests ---------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-5171] 550.0317 (1.0) 1,300.0965 (1.0) 685.9525 (1.0) 182.3577 (1.0) 590.0860 (1.0) 185.9665 (1.0) 160;136 1147 1
test_init[bidict-5171] 1,523.9716 (2.77) 22,585.1536 (17.37) 2,341.5089 (3.41) 1,601.9414 (8.78) 1,896.3814 (3.21) 624.8951 (3.36) 4;39 458 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------ benchmark 'test_init[56902]': 2 tests ------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-56902] 6.3610 (1.0) 10.1950 (1.0) 6.8830 (1.0) 0.7374 (1.0) 6.5750 (1.0) 0.4320 (1.0) 9;9 94 1
test_init[bidict-56902] 23.5901 (3.71) 35.4819 (3.48) 27.3424 (3.97) 3.3213 (4.50) 26.1536 (3.98) 5.3980 (12.50) 10;0 30 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------ benchmark 'test_setitem[11]': 2 tests ------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-11] 953.6743 (1.0) 953.6743 (1.0) 953.6743 (1.0) 0.0000 (1.0) 953.6743 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-11] 5,960.4645 (6.25) 5,960.4645 (6.25) 5,960.4645 (6.25) 0.0000 (1.0) 5,960.4645 (6.25) 0.0000 (1.0) 0;0 1 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------ benchmark 'test_setitem[110]': 2 tests ------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-110] 953.6743 (1.0) 953.6743 (1.0) 953.6743 (1.0) 0.0000 (1.0) 953.6743 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-110] 5,006.7902 (5.25) 5,006.7902 (5.25) 5,006.7902 (5.25) 0.0000 (1.0) 5,006.7902 (5.25) 0.0000 (1.0) 0;0 1 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------- benchmark 'test_setitem[767]': 2 tests ----------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-767] 2.1458 (1.0) 2.1458 (1.0) 2.1458 (1.0) 0.0000 (1.0) 2.1458 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-767] 5.0068 (2.33) 5.0068 (2.33) 5.0068 (2.33) 0.0000 (1.0) 5.0068 (2.33) 0.0000 (1.0) 0;0 1 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------- benchmark 'test_setitem[5171]': 2 tests --------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-5171] 953.6743 (1.0) 953.6743 (1.0) 953.6743 (1.0) 0.0000 (1.0) 953.6743 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-5171] 10,013.5803 (10.50) 10,013.5803 (10.50) 10,013.5803 (10.50) 0.0000 (1.0) 10,013.5803 (10.50) 0.0000 (1.0) 0;0 1 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------ benchmark 'test_setitem[56902]': 2 tests ------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers(*) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-56902] 5.0068 (1.0) 5.0068 (1.0) 5.0068 (1.0) 0.0000 (1.0) 5.0068 (1.0) 0.0000 (1.0) 0;0 1 1
test_setitem[bidict-56902] 11.9209 (2.38) 11.9209 (2.38) 11.9209 (2.38) 0.0000 (1.0) 11.9209 (2.38) 0.0000 (1.0) 0;0 1 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What's going on with iterations though?
If you're wondering why the iterations aren't the same for all the tests, it's the calibration picking up a "right number of iterations". See: http://pytest-benchmark.readthedocs.org/en/latest/calibration.html
Thanks @ionelmc, I'd seen that and it's a nice explanation. I'm curious to understand it a bit deeper (e.g. why some of my tests require only 1 iteration where others require 30 or 100), but I understand if that's outside the scope of what you want to document.
Very fast functions require multiple runs (the "iterations") to get an accurate measurement. The number of iterations gets picked automatically depending on how precise your timer is. Note that the default timer is platform specific.
I'd be happy if you can explain what's not clear in the documentation regarding the iterations and calibration concepts.