pytest-benchmark
pytest-benchmark copied to clipboard
Make it possible to perform more than one benchmark per pytest test
Say, I have a test function like this:
@pytest.mark.benchmark(group="write_100_files_1K_serial")
def test_bench_write_100_files_1K_serial(temp_path, benchmark1, benchmark2):
benchmark1.name = "trio"
benchmark1(trio.run, bench_trio_write_100_files_1K_serial, temp_path)
benchmark2.name = "datastore"
benchmark2(trio.run, bench_fsds_write_100_files_1K_serial, temp_path)
assert benchmark2.stats.stats.median < (2 * benchmark1.stats.stats.median)
Since both of these benchmark calls are I/O bound (or they should be anyway… different story), I cannot compare them to fixed values. Instead, I'd like to compare the relative slow-down/speed-up of my piece of code to some reference code – that is what the test assert does.
Any while the above code actually works fine, it only does so because some private API usage (it does work flawlessly however!):
import pytest
import pytest_benchmark.plugin
@pytest.fixture(scope="function")
def benchmark1(request):
return pytest_benchmark.plugin.benchmark.__pytest_wrapped__.obj(request)
@pytest.fixture(scope="function")
def benchmark2(request):
return pytest_benchmark.plugin.benchmark.__pytest_wrapped__.obj(request)
See also https://github.com/pytest-dev/pytest/issues/2703 for the pytest-side limitation of things. The “official solution” recommended by pytest is to make fixtures factory functions. Would this be something that you would be comfortable with exposing as part of this library?
Well I guess we could have an make_benchmark
or benchmark_setup
(pytest-django's style) fixture ...
I still don't get your usecase. You only need this to compare and assert relative results of 2 benchmarks?
@ionelmc I might have a use case for this, I'm rewriting an API and I'd like to compare the performance with previous api to make sure the new one is not slower. I'm doing this with fixtures at the moment, but maybe calling the benchmark function twice and check the time might be better :)
@patrick91 perhaps you could use one of the hooks (eg: pytest_benchmark_update_json
) to make some assertions on the results?
Or perhaps pytest_benchmark_group_stats
if you compare to past data?
I doubt the plugin could have a nicer way to deal with your usecase as there are so many ways of looking and doing things with the data. I mean that's why the plugin has options to output json in the first place.
Hi @ionelmc, I have a use case for this. It is a long-running test with multiple stages I would like to individually benchmark. Due to the current behavior, to get the necessary data points, the test must be run multiple times, benchmarking only one stage at a time. This can significantly increase the overall testing time having to teardown and setup each run. My initial thought is the pedantic mode could be expanded to include any additional arguments that may be required to facilitate this functionality. Thoughts?
EDIT... what if target could take a list... eg:
def test_the_thing(fixture):
def setup(): ...
def stage1(args): ...
def stage2(args): ...
trigger_external_async_process() # Call not included in benchmark
benchmark.pedantic(target=[stage1, stage2], setup=setup, rounds=1, ...)
...
I really love pytest-benchmark, but I am also in a situation where my use case requires multiple benchmarks per test case in order to avoid unreasonable setup/teardown time.
I am benchmarking some software that involves setting up and tearing down the database, and my tests are parametrized by the number of sample rows in the database so that I can measure and plot the scaling of the code and compare it with the expected big-O scaling. The database gets populated with random data, but it is expensive to repeatedly set up and tear down the database. What I would like to do is put the benchmark inside a for-loop that adds more random data to the database on each iteration.
my use case requires multiple benchmarks per test case in order to avoid unreasonable setup/teardown time.
Could you alternatively solve this by reusing a fixture (e.g. module scope)?