pytest-benchmark give example of parametrizing on input size

I'm using pytest-benchmark for the first time to benchmark bidict: https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py

I'm wondering how using different input sizes affects my benchmarks (it would be cool to be able to generate a graph that shows that a particular function is e.g. quadratic with respect to input size, for example).

This seems like a common use case people might have when benchmarking their code, but I don't see any examples of how to do this in the README, so I'm wondering if I'm missing something. If not, would it be valuable to give an example or two? If I can figure out how to do this, I'd be happy to work up a PR adding an example to your docs if there is interest.

And if you happen to have any other benchmarking advice from looking at what I'm doing above, it'd be much appreciated.

Thanks!

Feb 26 '16 21:02 jab

On a quick glance, you'd parametrize on specific inputs.

Some examples:

parametrize on backend: https://github.com/ionelmc/python-lazy-object-proxy/blob/master/tests/test_lazy_object_proxy.py#L1831 (results: https://travis-ci.org/ionelmc/python-lazy-object-proxy/jobs/93851093#L1508)
parametrize on backend (but parametrized fixture): https://github.com/ionelmc/python-hunter/blob/master/tests/test_hunter.py#L840
another example: https://bitbucket.org/antocuni/capnpy/src/97ec7c8fe435ee10d5224aea482ca6d0ec4195c8/capnpy/benchmarks/test_benchmarks.py?at=master&fileviewer=file-view-default

Feb 26 '16 21:02 ionelmc

Thanks so much for those examples! Look forward to taking a closer look soon.

Feb 26 '16 22:02 jab

Now that I look more at your test code, what's the purpose of using the groups?

Another thing, your tests may be a bit tricky. Because running the benchmark will cause side-effects you would need to use "pedantic mode" to precisely test performance at some specific object size. Eg, after 10000 runs the setitem may perform way slower cause the object has way more items.

@pytest.fixture(params=[1000, 100000]) # this would be the intial size
def data(request, backend):
  return {object(): object() for _ in range(request.param)}

@pytest.fixture(params=[dict, invdict], ids=lambda t: t.__name__)
def backend(request):
  return request.param

from collections import deque

def test_setitem(benchmark, data, backend):
    key, value = object(), object()

    def runner(obj):
        obj[key] = value

    def setup():
        return backend(data)

    benchmark.pedantic(runner, setup=setup, iterations=100)

Also see http://hackebrot.github.io/pytest-tricks/create_tests_via_parametrization/

So what's the deal with the pedantic mode? See http://pytest-benchmark.readthedocs.org/en/v3.0.0/pedantic.html but note that it's dangerous so you would need to manually find the ideal number of iterations. If it's too low you will get high IQR/stddev.

I'm not really sure you need the pedantic mode, do you expect the class to perform differently depending on data size?

Feb 26 '16 22:02 ionelmc

Wow, thanks for taking a closer look, this looks super helpful! I'm about to get pulled away to a conference for the next 10 days, but psyched to dig into this next chance I get. Really appreciate your fast and thorough responses!

Feb 26 '16 22:02 jab

Hi @ionelmc, I took a closer look and this was definitely helpful, thank you. Several improvements based on your tips in my latest version: https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py

There are still some unanswered questions (e.g. how to get separate benchmark groups per input size per test; in the meantime just using a single input size), which I've explained in the comments there. Please let me know if what I'm trying to do still isn't clear after reading that.

And here are some benchmark results from running these on my machine:

----------------------------------------------------------------------------- benchmark 'get_key_by_val': 2 tests -----------------------------------------------------------------------------
Name (time in ns)                         Min                    Max                Mean              StdDev              Median                 IQR            Outliers(*)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-1024]     183.7601 (1.0)       2,651.3099 (1.0)      230.1725 (1.0)       74.6707 (1.0)      186.8601 (1.0)       59.9198 (1.0)        7426;7099   53778         100
test_get_key_by_val[bidict-1024]     482.7001 (2.63)     23,708.3004 (8.94)     614.5220 (2.67)     281.2807 (3.77)     497.3990 (2.66)     158.8996 (2.65)     26699;26715  189790          10
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------- benchmark 'init': 2 tests -----------------------------------------------------------------------------
Name (time in us)               Min                    Max                Mean              StdDev              Median                 IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-1024]     112.6600 (1.0)         431.6380 (1.0)      141.9397 (1.0)       40.1109 (1.0)      117.2015 (1.0)       37.8770 (1.0)         1036;879    7272           1
test_init[bidict-1024]     384.7930 (3.42)     11,095.5590 (25.71)    573.1356 (4.04)     755.2874 (18.83)    450.0470 (3.84)     143.3765 (3.79)          11;102     957           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'setitem': 2 tests ---------------------------------------------------------------------
Name (time in us)                Min               Max              Mean            StdDev            Median               IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-1024]     1.7640 (1.0)      1.7640 (1.0)      1.7640 (1.0)      0.0000 (1.0)      1.7640 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-1024]     8.4050 (4.76)     8.4050 (4.76)     8.4050 (4.76)     0.0000 (1.0)      8.4050 (4.76)     0.0000 (1.0)              0;0       1           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And here are some results from running these on Travis (scroll to the bottom): https://travis-ci.org/jab/bidict/jobs/113642948

(Speaking of which, is there any way to see the output more clearly on Travis? Doesn't seem to be a way to make lines wider, and the wrapping of long lines makes it hard to read the benchmark results.)

Thanks again for all your help!

Mar 04 '16 12:03 jab

On Fri, Mar 4, 2016 at 2:26 PM, jab [email protected] wrote:

is there any way to see the output more clearly on Travis? Doesn't seem to be a way to make lines wider, and the wrapping of long lines makes it hard to read the benchmark results.

You could only show the more interesting columns, eg: --benchmark-columns=min,stddev

Mar 04 '16 12:03 ionelmc

On Fri, Mar 4, 2016 at 2:26 PM, jab [email protected] wrote:

how to get separate benchmark groups per input size per test; in the meantime just using a single input size

In the master branch (I hope to release soon) there ability to group by specific parameter (eg: --benchmark-group-by=param:size)

Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats

Mar 04 '16 12:03 ionelmc

You could only show the more interesting columns, eg: --benchmark-columns=min,stddev

When I try that I get py.test: error: unrecognized arguments: --benchmark-columns=min,stddev and I don't see it documented at http://pytest-benchmark.readthedocs.org/en/stable/usage.html#commandline-options -- is that a new option that hasn't yet made it into a release?

Mar 05 '16 14:03 jab

On Sat, Mar 5, 2016 at 4:04 PM, jab [email protected] wrote:

is that a new option that hasn't yet made it into a release?

Yes, it's in the master only.

Mar 05 '16 14:03 ionelmc

Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats

I wrote something based on the example in the docs but it's never getting called:

  File "/Library/Python/2.7/site-packages/pytest_benchmark/plugin.py", line 986, in pytest_benchmark_group_stats
    raise NotImplementedError("Unsupported grouping %r." % group_by)
NotImplementedError: Unsupported grouping 'special'.

Is @pytest.mark.hookwrapper failing to register the pytest_benchmark_group_stats() I defined?

I'm running off latest release (3.0.0) not master.

Mar 05 '16 14:03 jab

Where did you define the hook?

On Saturday, 5 March 2016, jab [email protected] wrote:

Alternatively you can use the the group hook: http://pytest-benchmark.readthedocs.org/en/v3.0.0/hooks.html#pytest_benchmark.hookspec.pytest_benchmark_group_stats

I wrote something based on the example in the docs but it's never getting called:

File "/Library/Python/2.7/site-packages/pytest_benchmark/plugin.py", line 986, in pytest_benchmark_group_stats raise NotImplementedError("Unsupported grouping %r." % group_by) NotImplementedError: Unsupported grouping 'special'.

Is @pytest.mark.hookwrapper failing to register the pytest_benchmark_group_stats() I defined?

I'm running off latest release (3.0.0) not master.

— Reply to this email directly or view it on GitHub https://github.com/ionelmc/pytest-benchmark/issues/44#issuecomment-192659904 .

Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

Mar 05 '16 14:03 ionelmc

In https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py right after the imports.

Mar 05 '16 14:03 jab

In case you didn't see this other question in https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py#L104, I'll paste here:

    # TODO: iterations=100 causes: ValueError: Can't use more than 1 `iterations` with a `setup` function.
    #benchmark.pedantic(setitem, setup=setup, iterations=100)
    benchmark.pedantic(setitem, setup=setup)

Mar 05 '16 14:03 jab

It needs to be in a conftest.py file

On Saturday, 5 March 2016, jab [email protected] wrote:

In https://github.com/jab/bidict/blob/0.12.0-dev/tests/test_benchmark.py right after the imports.

— Reply to this email directly or view it on GitHub https://github.com/ionelmc/pytest-benchmark/issues/44#issuecomment-192660832 .

Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

Mar 05 '16 14:03 ionelmc

Thanks @ionelmc, that did the trick. Benchmarks are now properly getting grouped by (test, input size):

------------------------------------------------------------------------ benchmark 'test_get_key_by_val[11]': 2 tests -----------------------------------------------------------------------
Name (time in ns)                       Min                    Max                Mean              StdDev              Median                 IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-11]     169.2772 (1.0)       3,588.1996 (1.0)      229.0304 (1.0)       75.6794 (1.0)      188.3507 (1.0)       59.6046 (1.0)        7380;7296   40330         100
test_get_key_by_val[bidict-11]     432.1337 (2.55)     10,564.9233 (2.94)     544.4666 (2.38)     196.3595 (2.59)     491.7383 (2.61)     134.1105 (2.25)       9261;8621   76261          16
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'test_get_key_by_val[110]': 2 tests -----------------------------------------------------------------------
Name (time in ns)                        Min                   Max                Mean              StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-110]     169.2772 (1.0)      2,918.2434 (1.0)      214.3679 (1.0)       67.3869 (1.0)      181.1981 (1.0)      59.6046 (1.0)        5056;4692   38480         100
test_get_key_by_val[bidict-110]     441.0744 (2.61)     6,794.9295 (2.33)     531.4409 (2.48)     168.9434 (2.51)     452.9953 (2.50)     95.3674 (1.60)     11852;12138   99865          20
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'test_get_key_by_val[767]': 2 tests -----------------------------------------------------------------------
Name (time in ns)                        Min                   Max                Mean              StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-767]     178.8139 (1.0)      2,598.7625 (1.0)      211.7568 (1.0)       63.7221 (1.0)      181.1981 (1.0)      50.0679 (3.57)       4973;4321   39946         100
test_get_key_by_val[bidict-767]     406.7140 (2.27)     7,068.4096 (2.72)     518.4312 (2.45)     162.3665 (2.55)     476.8372 (2.63)     14.0246 (1.0)       9461;39030   99865          17
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------ benchmark 'test_get_key_by_val[5171]': 2 tests -----------------------------------------------------------------------
Name (time in ns)                         Min                    Max                Mean              StdDev              Median                 IQR            Outliers(*)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-5171]     169.2772 (1.0)       1,990.7951 (1.0)      209.9508 (1.0)       61.8881 (1.0)      181.1981 (1.0)       30.9944 (1.0)        6677;7031   55189         100
test_get_key_by_val[bidict-5171]     400.9767 (2.37)     24,817.2066 (12.47)    532.0872 (2.53)     204.3100 (3.30)     455.1627 (2.51)     130.0465 (4.20)     10657;10505   99865          22
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'test_get_key_by_val[56902]': 2 tests -----------------------------------------------------------------------
Name (time in ns)                          Min                   Max                Mean              StdDev              Median                 IQR            Outliers(*)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_get_key_by_val[2idict-56902]     169.2772 (1.0)      2,539.1579 (1.0)      199.6674 (1.0)       50.8129 (1.0)      181.1981 (1.0)       11.9209 (1.0)        3622;7758   45591         100
test_get_key_by_val[bidict-56902]     429.1534 (2.54)     4,466.3747 (1.76)     544.7555 (2.73)     171.8878 (3.38)     468.8899 (2.59)     135.1039 (11.33)      7092;6542   52429          30
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------- benchmark 'test_init[11]': 2 tests -------------------------------------------------------------------
Name (time in us)            Min                 Max               Mean             StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-11]      1.4901 (1.0)       25.4512 (1.0)       1.9713 (1.0)       0.6869 (1.0)       1.7285 (1.0)      0.4768 (1.0)      12968;12389   99865           4
test_init[bidict-11]     12.8746 (8.64)     665.9031 (26.16)    17.4738 (8.86)     14.2990 (20.82)    14.0667 (8.14)     3.0994 (6.50)          75;704    4199           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------- benchmark 'test_init[110]': 2 tests ----------------------------------------------------------------------
Name (time in us)             Min                    Max               Mean              StdDev             Median                IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-110]     10.9673 (1.0)         115.1562 (1.0)      13.8944 (1.0)        4.4848 (1.0)      11.9209 (1.0)       3.0994 (1.0)        4436;4143   33289           1
test_init[bidict-110]     43.8690 (4.00)     12,042.0456 (104.57)   59.3539 (4.27)     134.3110 (29.95)    49.1142 (4.12)     13.8283 (4.46)        109;1037    8406           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'test_init[767]': 2 tests -----------------------------------------------------------------------
Name (time in us)              Min                   Max                Mean              StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-767]      71.7640 (1.0)        364.0652 (1.0)       81.0220 (1.0)       19.4860 (1.0)       72.9561 (1.0)       1.1921 (1.0)         871;1871    9259           1
test_init[bidict-767]     231.9813 (3.23)     3,443.0027 (9.46)     304.3127 (3.76)     184.6691 (9.48)     248.9090 (3.41)     75.3403 (63.20)        100;311    2619           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------- benchmark 'test_init[5171]': 2 tests ---------------------------------------------------------------------------
Name (time in us)                 Min                    Max                  Mean                StdDev                Median                 IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-5171]       550.0317 (1.0)       1,300.0965 (1.0)        685.9525 (1.0)        182.3577 (1.0)        590.0860 (1.0)      185.9665 (1.0)          160;136    1147           1
test_init[bidict-5171]     1,523.9716 (2.77)     22,585.1536 (17.37)    2,341.5089 (3.41)     1,601.9414 (8.78)     1,896.3814 (3.21)     624.8951 (3.36)            4;39     458           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------ benchmark 'test_init[56902]': 2 tests ------------------------------------------------------------------
Name (time in ms)               Min                Max               Mean            StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_init[2idict-56902]      6.3610 (1.0)      10.1950 (1.0)       6.8830 (1.0)      0.7374 (1.0)       6.5750 (1.0)      0.4320 (1.0)              9;9      94           1
test_init[bidict-56902]     23.5901 (3.71)     35.4819 (3.48)     27.3424 (3.97)     3.3213 (4.50)     26.1536 (3.98)     5.3980 (12.50)           10;0      30           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------ benchmark 'test_setitem[11]': 2 tests ------------------------------------------------------------------------
Name (time in ns)                  Min                   Max                  Mean            StdDev                Median               IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-11]       953.6743 (1.0)        953.6743 (1.0)        953.6743 (1.0)      0.0000 (1.0)        953.6743 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-11]     5,960.4645 (6.25)     5,960.4645 (6.25)     5,960.4645 (6.25)     0.0000 (1.0)      5,960.4645 (6.25)     0.0000 (1.0)              0;0       1           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------ benchmark 'test_setitem[110]': 2 tests ------------------------------------------------------------------------
Name (time in ns)                   Min                   Max                  Mean            StdDev                Median               IQR            Outliers(*)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-110]       953.6743 (1.0)        953.6743 (1.0)        953.6743 (1.0)      0.0000 (1.0)        953.6743 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-110]     5,006.7902 (5.25)     5,006.7902 (5.25)     5,006.7902 (5.25)     0.0000 (1.0)      5,006.7902 (5.25)     0.0000 (1.0)              0;0       1           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------- benchmark 'test_setitem[767]': 2 tests ----------------------------------------------------------------
Name (time in us)               Min               Max              Mean            StdDev            Median               IQR            Outliers(*)  Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-767]     2.1458 (1.0)      2.1458 (1.0)      2.1458 (1.0)      0.0000 (1.0)      2.1458 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-767]     5.0068 (2.33)     5.0068 (2.33)     5.0068 (2.33)     0.0000 (1.0)      5.0068 (2.33)     0.0000 (1.0)              0;0       1           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------- benchmark 'test_setitem[5171]': 2 tests --------------------------------------------------------------------------
Name (time in ns)                     Min                    Max                   Mean            StdDev                 Median               IQR            Outliers(*)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-5171]        953.6743 (1.0)         953.6743 (1.0)         953.6743 (1.0)      0.0000 (1.0)         953.6743 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-5171]     10,013.5803 (10.50)    10,013.5803 (10.50)    10,013.5803 (10.50)    0.0000 (1.0)      10,013.5803 (10.50)    0.0000 (1.0)              0;0       1           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------ benchmark 'test_setitem[56902]': 2 tests ------------------------------------------------------------------
Name (time in us)                  Min                Max               Mean            StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_setitem[2idict-56902]      5.0068 (1.0)       5.0068 (1.0)       5.0068 (1.0)      0.0000 (1.0)       5.0068 (1.0)      0.0000 (1.0)              0;0       1           1
test_setitem[bidict-56902]     11.9209 (2.38)     11.9209 (2.38)     11.9209 (2.38)     0.0000 (1.0)      11.9209 (2.38)     0.0000 (1.0)              0;0       1           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What's going on with iterations though?

Mar 05 '16 15:03 jab

If you're wondering why the iterations aren't the same for all the tests, it's the calibration picking up a "right number of iterations". See: http://pytest-benchmark.readthedocs.org/en/latest/calibration.html

Apr 10 '16 00:04 ionelmc

Thanks @ionelmc, I'd seen that and it's a nice explanation. I'm curious to understand it a bit deeper (e.g. why some of my tests require only 1 iteration where others require 30 or 100), but I understand if that's outside the scope of what you want to document.

Apr 12 '16 14:04 jab

Very fast functions require multiple runs (the "iterations") to get an accurate measurement. The number of iterations gets picked automatically depending on how precise your timer is. Note that the default timer is platform specific.

Apr 12 '16 15:04 ionelmc

I'd be happy if you can explain what's not clear in the documentation regarding the iterations and calibration concepts.

Apr 12 '16 15:04 ionelmc