pytest-benchmark icon indicating copy to clipboard operation
pytest-benchmark copied to clipboard

Make it possible to perform more than one benchmark per pytest test

Open ntninja opened this issue 4 years ago • 6 comments

Say, I have a test function like this:

@pytest.mark.benchmark(group="write_100_files_1K_serial")
def test_bench_write_100_files_1K_serial(temp_path, benchmark1, benchmark2):
	benchmark1.name = "trio"
	benchmark1(trio.run, bench_trio_write_100_files_1K_serial, temp_path)
	
	benchmark2.name = "datastore"
	benchmark2(trio.run, bench_fsds_write_100_files_1K_serial, temp_path)
	
	assert benchmark2.stats.stats.median < (2 * benchmark1.stats.stats.median)

Since both of these benchmark calls are I/O bound (or they should be anyway… different story), I cannot compare them to fixed values. Instead, I'd like to compare the relative slow-down/speed-up of my piece of code to some reference code – that is what the test assert does.

Any while the above code actually works fine, it only does so because some private API usage (it does work flawlessly however!):

import pytest
import pytest_benchmark.plugin

@pytest.fixture(scope="function")
def benchmark1(request):
	return pytest_benchmark.plugin.benchmark.__pytest_wrapped__.obj(request)
@pytest.fixture(scope="function")
def benchmark2(request):
	return pytest_benchmark.plugin.benchmark.__pytest_wrapped__.obj(request)

See also https://github.com/pytest-dev/pytest/issues/2703 for the pytest-side limitation of things. The “official solution” recommended by pytest is to make fixtures factory functions. Would this be something that you would be comfortable with exposing as part of this library?

ntninja avatar Apr 10 '20 01:04 ntninja

Well I guess we could have an make_benchmark or benchmark_setup (pytest-django's style) fixture ...

I still don't get your usecase. You only need this to compare and assert relative results of 2 benchmarks?

ionelmc avatar May 10 '20 14:05 ionelmc

@ionelmc I might have a use case for this, I'm rewriting an API and I'd like to compare the performance with previous api to make sure the new one is not slower. I'm doing this with fixtures at the moment, but maybe calling the benchmark function twice and check the time might be better :)

patrick91 avatar Jul 15 '20 09:07 patrick91

@patrick91 perhaps you could use one of the hooks (eg: pytest_benchmark_update_json) to make some assertions on the results?

Or perhaps pytest_benchmark_group_stats if you compare to past data?

I doubt the plugin could have a nicer way to deal with your usecase as there are so many ways of looking and doing things with the data. I mean that's why the plugin has options to output json in the first place.

ionelmc avatar Nov 02 '20 09:11 ionelmc

Hi @ionelmc, I have a use case for this. It is a long-running test with multiple stages I would like to individually benchmark. Due to the current behavior, to get the necessary data points, the test must be run multiple times, benchmarking only one stage at a time. This can significantly increase the overall testing time having to teardown and setup each run. My initial thought is the pedantic mode could be expanded to include any additional arguments that may be required to facilitate this functionality. Thoughts?

EDIT... what if target could take a list... eg:

def test_the_thing(fixture):
  def setup(): ...
  def stage1(args): ...
  def stage2(args): ...
  trigger_external_async_process()  # Call not included in benchmark
  benchmark.pedantic(target=[stage1, stage2], setup=setup, rounds=1, ...)
...

sarahbx avatar Apr 02 '21 10:04 sarahbx

I really love pytest-benchmark, but I am also in a situation where my use case requires multiple benchmarks per test case in order to avoid unreasonable setup/teardown time.

I am benchmarking some software that involves setting up and tearing down the database, and my tests are parametrized by the number of sample rows in the database so that I can measure and plot the scaling of the code and compare it with the expected big-O scaling. The database gets populated with random data, but it is expensive to repeatedly set up and tear down the database. What I would like to do is put the benchmark inside a for-loop that adds more random data to the database on each iteration.

lpsinger avatar Nov 30 '21 18:11 lpsinger

my use case requires multiple benchmarks per test case in order to avoid unreasonable setup/teardown time.

Could you alternatively solve this by reusing a fixture (e.g. module scope)?

cafhach avatar Jul 05 '22 06:07 cafhach