zarr-python add benchmarks using pytest-benchmark and codspeed

since #3554 was an unpopular direction I'm going instead with codspeed + pytest-benchmark. Opening as a draft because I haven't looked into how codspeed works at all, but I'd like people to weigh in on whether these initial benchmarks make sense. Naturally we can add more specific ones later, but I figured just some bulk array read / write workloads would be a good start.

Oct 30 '25 11:10 d-v-b

@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it?

Oct 30 '25 14:10 d-v-b

@zarr-developers/steering-council I don't have permission to register this repo with codspeed. I submitted a request to register it, could someone approve it?

done

Oct 30 '25 17:10 normanrz

does anyone have opinions about benchmarks? feel free to suggest something concrete. Otherwise, I think we should take this as-is and deal with later benchmarks (like partial shard read / writes) in a subsequent pr

Oct 30 '25 19:10 d-v-b

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 30 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

test_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip] (WallTime): 1.9 s
test_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip] (WallTime): 888.4 ms
test_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-None] (WallTime): 1.4 s
test_write_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None] (WallTime): 486.1 ms
test_write_array[memory-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None] (WallTime): 9.5 s
test_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None] (WallTime): 982.2 ms
test_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip] (WallTime): 1.4 s
test_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip] (WallTime): 2.8 s
test_write_array[local-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-None] (WallTime): 2.4 s
test_read_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None] (WallTime): 3.3 s
test_slice_indexing[(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory] (WallTime): 223.8 ms
test_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-None] (WallTime): 303.6 ms
test_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=None)-gzip] (WallTime): 552.7 ms
test_slice_indexing[(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-memory] (WallTime): 795 µs
test_read_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-gzip] (WallTime): 5.7 s
test_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-gzip] (WallTime): 1.2 s
test_slice_indexing[(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory] (WallTime): 3.9 ms
test_write_array[memory-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-gzip] (WallTime): 13.4 s
test_slice_indexing[(0, 0, 0)-memory] (WallTime): 768.4 µs
test_write_array[local-Layout(shape=(1000000,), chunks=(100,), shards=(1000000,))-None] (WallTime): 9.6 s
...

:information_source: Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Oct 30 '25 20:10 codspeed-hq[bot]

feel free to suggest something concrete

indexing please. that'll exercise the codec pipeline too.

a peakmem metric would be good to track also, if possible.

Oct 30 '25 20:10 dcherian

feel free to suggest something concrete

indexing please. that'll exercise the codec pipeline too.

a peakmem metric would be good to track also, if possible.

I don't think codspeed or pytest-benchmark do memory profiling. we would need https://pytest-memray.readthedocs.io/en/latest/ or something equivalent for that.

and an indexing benchmark sounds like a great idea but I don't think I have the bandwidth for it in this pr right now

Oct 31 '25 20:10 d-v-b

I added a benchmark that clearly reveals the performance improvement of #3561

Nov 03 '25 10:11 d-v-b

I added some slice-based benchmarks based on the examples from https://github.com/zarr-developers/zarr-python/issues/3524, and I updated the contributing docs with a section about the benchmarks. assuming we can resolve the discussion about which python / numpy version to use in the CI job, I think this is ready

Nov 03 '25 11:11 d-v-b

new problem: the codspeed CI benchmarks are way too slow! the benchmark suite runs in 90s locally, and It's taking over 40m to run in CI. Help would be appreciated in speeding this up.

Nov 03 '25 12:11 d-v-b

owing to the large number of syscalls in our benchmark code, codspeed recommended using the walltime instrument instead of their virtual CPU instrument. But to turn on the walltime benchmark, we would need to run our benchmarking code on codspeed's servers, which is a security risk.

Given that codspeed is not turning out to be particularly simple, I am inclined to defer the codspeed CI stuff for later work. But if someone can help get the test runtime down, and / or we are OK running our benchmarks on codspeed's servers, then maybe we can get that sorted in this PR.

Nov 03 '25 15:11 d-v-b

looks like the walltime instrument is working! I think this is g2g

Nov 05 '25 19:11 d-v-b

(Enabled the app)

Dec 20 '25 20:12 joshmoore

IMO it'd be better to skip the tests/benchmarks during regular test runs in the interest of speed

i think this makes sense -- on my workstation the current benchmark suite takes 40s to run as regular tests, which is a big addition to our total test runtime. The latest changes to this branch skip the test/benchmarks folder by default when running our main test suite and the gpu tests.

Dec 21 '25 14:12 d-v-b