Tracking success rate of benchmarked functions

Open amoskopp opened this issue 2 years ago • 1 comments

I have a use case for tracking the performance and success rate of non-deterministic functions.

The following function serves to outline the scenario:

def foo():
    time.sleep(base_time + abs(random.gauss(0, 0.01)))
    if random.random() < error_rate:
        raise RuntimeError

I have played around and arrived at the following result:

def benchmark_pedantic_with_count(benchmark, function, *args, **kwargs):
    successes = []

    @wraps(function)
    def wrapper(*args, **kwargs):
        try:
            result = function(*args, **kwargs)
            successes.append(True)
            return result
        except:
            successes.append(False)

    benchmark.pedantic(wrapper, *args, **kwargs)
    benchmark.extra_info['success_count'] = sum(successes)
    new_stats_fields = list(benchmark.stats.stats.fields)
    new_stats_fields.append('succ')
    benchmark.stats.stats.fields = new_stats_fields
    benchmark.stats.stats.succ = sum(successes) / len(successes)

To get the new column succ actually displayed, I had to also:

Add succ to pytest_benchmark.utils.ALLOWED_COLUMNS.
Overwrite pytest_benchmark.table.display so it shows succ.

(How exactly to achieve those two things is left an an exercise for the reader.)

While this does work, I am unsure if my solution could be upstreamed easily. How should I do it if I want my solution to be merged into pytest-benchmark?

Alternate and related approaches:

Add an argument to benchmark.pedantic that makes it continue on exceptions, but gives it an argument of the list of exceptions caught (like [None, None, RuntimeError, None, RuntimeError]).
Add an argument to benchmark.pedantic to change the return type to a list of all results, then set up the benchmarked function so that it catches relevant exceptions and returns whatever I want.
Allow extra_info keys in the terminal table.

Mar 02 '23 22:03 amoskopp

This would be great!

Is there currently a way to omit failed tests from the timing statistics? If we have nondeterminism and record a success rate, it might be desirable to only account for successful runs in the statistics.

May 25 '23 12:05 basbebe