add benchmarks using pytest-benchmark and codspeed
since #3554 was an unpopular direction I'm going instead with codspeed + pytest-benchmark. Opening as a draft because I haven't looked into how codspeed works at all, but I'd like people to weigh in on whether these initial benchmarks make sense. Naturally we can add more specific ones later, but I figured just some bulk array read / write workloads would be a good start.
@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it?
@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it?
done
does anyone have opinions about benchmarks? feel free to suggest something concrete. Otherwise, I think we should take this as-is and deal with later benchmarks (like partial shard read / writes) in a subsequent pr
CodSpeed Performance Report
Congrats! CodSpeed is installed 🎉
🆕 30 new benchmarks were detected.
You will start to see performance impacts in the reports once the benchmarks are run from your default branch.
Detected benchmarks
test_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip](WallTime): 1.9 stest_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip](WallTime): 888.4 mstest_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-None](WallTime): 1.4 stest_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None](WallTime): 486.1 mstest_write_array[memory-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None](WallTime): 9.5 stest_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None](WallTime): 982.2 mstest_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip](WallTime): 1.4 stest_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip](WallTime): 2.8 stest_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-None](WallTime): 2.4 stest_read_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None](WallTime): 3.3 stest_slice_indexing[(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory](WallTime): 223.8 mstest_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None](WallTime): 303.6 mstest_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip](WallTime): 552.7 mstest_slice_indexing[(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-memory](WallTime): 795 µstest_read_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-gzip](WallTime): 5.7 stest_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip](WallTime): 1.2 stest_slice_indexing[(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory](WallTime): 3.9 mstest_write_array[memory-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-gzip](WallTime): 13.4 stest_slice_indexing[(0, 0, 0)-memory](WallTime): 768.4 µstest_write_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None](WallTime): 9.6 s- ...
:information_source: Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
feel free to suggest something concrete
indexing please. that'll exercise the codec pipeline too.
a peakmem metric would be good to track also, if possible.
feel free to suggest something concrete
indexing please. that'll exercise the codec pipeline too.
a peakmem metric would be good to track also, if possible.
I don't think codspeed or pytest-benchmark do memory profiling. we would need https://pytest-memray.readthedocs.io/en/latest/ or something equivalent for that.
and an indexing benchmark sounds like a great idea but I don't think I have the bandwidth for it in this pr right now
I added a benchmark that clearly reveals the performance improvement of #3561
I added some slice-based benchmarks based on the examples from https://github.com/zarr-developers/zarr-python/issues/3524, and I updated the contributing docs with a section about the benchmarks. assuming we can resolve the discussion about which python / numpy version to use in the CI job, I think this is ready
new problem: the codspeed CI benchmarks are way too slow! the benchmark suite runs in 90s locally, and It's taking over 40m to run in CI. Help would be appreciated in speeding this up.
owing to the large number of syscalls in our benchmark code, codspeed recommended using the walltime instrument instead of their virtual CPU instrument. But to turn on the walltime benchmark, we would need to run our benchmarking code on codspeed's servers, which is a security risk.
Given that codspeed is not turning out to be particularly simple, I am inclined to defer the codspeed CI stuff for later work. But if someone can help get the test runtime down, and / or we are OK running our benchmarks on codspeed's servers, then maybe we can get that sorted in this PR.
looks like the walltime instrument is working! I think this is g2g
(Enabled the app)
IMO it'd be better to skip the tests/benchmarks during regular test runs in the interest of speed
i think this makes sense -- on my workstation the current benchmark suite takes 40s to run as regular tests, which is a big addition to our total test runtime. The latest changes to this branch skip the test/benchmarks folder by default when running our main test suite and the gpu tests.