equinox icon indicating copy to clipboard operation
equinox copied to clipboard

TST: Add pytest-codspeed and benchmarking suite

Open nstarman opened this issue 9 months ago • 10 comments

Would be great for testing performance optimizations.

nstarman avatar Apr 14 '25 21:04 nstarman

Do you mean adding tests with pytest-benchmark? I'm tinkering with something similar over in optx - specifically I'd like to support benchmarking our solvers on optimisation problems. We could probably have a unified approach to benchmarking compile times too.

Personally I landed on this due to the familiar format, and the option for performance comparison across versions.

Apologies if this is too far off-topic :)

johannahaffner avatar Apr 14 '25 21:04 johannahaffner

Re pytest-benchmark, also good! Though pytest-codspeed is essentially a drop-in replacement for pytest-benchmark and hook ups to the nice https://codspeed.io service (free for open source).

nstarman avatar Apr 14 '25 21:04 nstarman

I like this idea as well, I've thought about suggesting speed regression tests to diffrax before (since that's come up in my work and others in the issues), but unifying things to check for speed in general sounds like a good idea

lockwo avatar Apr 14 '25 21:04 lockwo

This sounds reasonable to me. IIUC codspeed is a service for recording the values of benchmarks, and is otherwise exactly the same as pytest-benchmark? If so, this all sounds reasonable to me.

patrick-kidger avatar Apr 14 '25 21:04 patrick-kidger

In general, pytest-codspeed is a drop-in replacement. It does lack a few features supported by pytest-benchmark, but I haven't really run into them. Instead it offers a more stable benchmark and nice visualization / GH hooks / etc. They are also adding other cool features I haven't yet taken advantage of. The most challenging thing for benchmarking JAX is separately testing the jit-compile vs jit-eval. See https://github.com/GalacticDynamics/unxt/blob/v1.4.0/tests/benchmark/test_quaxed.py for one possible approach.

nstarman avatar Apr 14 '25 22:04 nstarman

Yup, using AOT compilation as a measure of compilation time makes sense to me, though it is not necessarily the same as JIT compilation time (?).

With codspeed, can custom metrics be added to the reported/saved results? To benchmark solvers, this would be useful - and could include things such as the number of steps taken, as well as how far off we are of the expected result.

johannahaffner avatar Apr 15 '25 15:04 johannahaffner

With codspeed, can custom metrics be added to the reported/saved results? To benchmark solvers, this would be useful - and could include things such as the number of steps taken, as well as how far off we are of the expected result.

Is this maybe what you are thinking of? There's this pattern:

def test_mean_and_median_performance(benchmark):
    # Precompute some data useful for the benchmark but that should not be
    # included in the benchmark time
    data = [1, 2, 3, 4, 5]

    # Benchmark the execution of the function:
    # The `@benchmark` decorator will automatically call the function and
    # measure its execution
    @benchmark
    def bench():
        mean(data)
        median(data)

So after benchmarking you can test correctness. However I'm not sure that's better than separation of concerns: different tests for different things.

nstarman avatar Apr 15 '25 16:04 nstarman

I mean this feature: https://pytest-benchmark.readthedocs.io/en/latest/usage.html#extra-info

So the function being benchmarked can return something (such as a diffrax or optimistix Solution), fields of which we might like to save.

johannahaffner avatar Apr 15 '25 18:04 johannahaffner

Oh, that I'm not sure about in pytest-codspeed.

nstarman avatar Apr 15 '25 21:04 nstarman

They can probably coexist happily, I doubt we'd want to run a large benchmarking suite in CI anyway! But some speed-tests on a small set of representative problems would be great to include.

johannahaffner avatar Apr 15 '25 21:04 johannahaffner