zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

example that reveals inefficient sharded writes

Open d-v-b opened this issue 3 months ago • 3 comments

Check out how many times we call get in this example (writing a single shard with 10 chunks):

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
import numpy as np

from zarr.storage._logging import LoggingStore

store  = LoggingStore(store=zarr.storage.MemoryStore())

shape = (10,)
chunks=(1,)
shards=(10,)
data = np.ones(shape)
zarr.create_array(
            store=store,
            data=data,
            chunks=chunks,
            shards=shards,
            fill_value=0,
            overwrite=True,
        )
array = zarr.open_array(store)[:]
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.read_only
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.read_only [0.00 s]
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore._ensure_open
2025-09-01 15:50:45,109 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.supports_deletes
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.supports_deletes [0.00 s]
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.delete_dir
2025-09-01 15:50:45,110 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.delete_dir [0.00 s]
2025-09-01 15:50:45,111 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(zarr.json)
2025-09-01 15:50:45,111 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(zarr.json) [0.00 s]
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,112 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,113 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,114 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,115 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,115 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,118 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,119 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.set(c/0)
2025-09-01 15:50:45,120 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore._ensure_open
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-09-01 15:50:45,121 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(zarr.json)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(zarr.json) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(.zarray)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(.zarray) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(.zattrs)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(.zattrs) [0.00 s]
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO -  Calling MemoryStore.get(c/0)
2025-09-01 15:50:45,122 - LoggingStore(memory://4309622656) - INFO - Finished MemoryStore.get(c/0) [0.00 s]

In principle we should only call get(c/0) once -- at the very end, when we need to retrieve bytes from it. instead, we call get(c/0) ~10 (i can't count) times in this example. We should also only call set(c/0) once, because we are writing a full shard. Instead, we call set once per chunk, which is extremely inefficient for sharded writes.

I'm still trying to figure out how this is being controlled. I suspect it has to do with the batch size of the codec pipeline class, but I haven't confirmed this. I will update this issue when I get further.

d-v-b avatar Sep 01 '25 15:09 d-v-b

FYI I can't reproduce this on main. If I run above script, I get

2025-10-05 21:57:22,920 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.read_only
2025-10-05 21:57:22,921 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.read_only [0.00 s]
2025-10-05 21:57:22,921 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore._ensure_open
2025-10-05 21:57:22,921 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-10-05 21:57:22,921 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.supports_deletes
2025-10-05 21:57:22,921 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.supports_deletes [0.00 s]
2025-10-05 21:57:22,922 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.delete_dir
2025-10-05 21:57:22,922 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.delete_dir [0.00 s]
2025-10-05 21:57:22,922 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.set(zarr.json)
2025-10-05 21:57:22,922 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.set(zarr.json) [0.00 s]
2025-10-05 21:57:22,923 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.get(c/0)
2025-10-05 21:57:22,923 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.get(c/0) [0.00 s]
2025-10-05 21:57:22,925 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.set(c/0)
2025-10-05 21:57:22,925 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.set(c/0) [0.00 s]
2025-10-05 21:57:22,926 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore._ensure_open
2025-10-05 21:57:22,926 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore._ensure_open [0.00 s]
2025-10-05 21:57:22,926 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.get(zarr.json)
2025-10-05 21:57:22,926 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.get(zarr.json) [0.00 s]
2025-10-05 21:57:22,927 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.get(.zarray)
2025-10-05 21:57:22,927 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.get(.zarray) [0.00 s]
2025-10-05 21:57:22,927 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.get(.zattrs)
2025-10-05 21:57:22,927 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.get(.zattrs) [0.00 s]
2025-10-05 21:57:22,928 - LoggingStore(memory://4378236672) - INFO -  Calling MemoryStore.get(c/0)
2025-10-05 21:57:22,928 - LoggingStore(memory://4378236672) - INFO - Finished MemoryStore.get(c/0) [0.00 s]

The last released version that leads to your logs is 3.1.2

oliverwm1 avatar Oct 06 '25 05:10 oliverwm1

yes, this was partially fixed in #3422, but we still have an extra get in the output you generated. It's unnecessary because we are writing a complete shard, and so we don't need to read anything from the existing one. Fixing this requires adding some logic to the sharding codec.

d-v-b avatar Oct 06 '25 07:10 d-v-b

got it, thanks!

oliverwm1 avatar Oct 06 '25 14:10 oliverwm1