zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

re-opening `ZipStore` with a new mode

Open d-v-b opened this issue 1 year ago • 6 comments

In main it's not possible to make a read-only copy of a ZipStore instance. Is there any particular reason why this is not allowed, or is it something we could add? Naively, it seems like a read-only copy of any store class should always be safe to generate.

d-v-b avatar Oct 30 '24 13:10 d-v-b

IIRC, ZipStore has some instance level state that makes it hard to re-open. The file needs to be closed and the ZIP footer written before anyone else can open the file (going off memory, so I might have some details wrong).

TomAugspurger avatar Oct 30 '24 13:10 TomAugspurger

that means that ZipStore always errors if it reaches this conditional in make_store_path: https://github.com/zarr-developers/zarr-python/blob/4c3081c9b4678d1de438e0d2ee8e0fa368a61f17/src/zarr/storage/common.py#L224-L227

If we want to keep these store semantics (I'm not sure that we do...) we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode. Otherwise, invoking something like open(store=store, path="doesnotexist", mode="r", zarr_format=zarr_format) raises NotImplementedError with no guidance.

d-v-b avatar Oct 30 '24 13:10 d-v-b

would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?

d-v-b avatar Oct 30 '24 13:10 d-v-b

we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode

I wonder how you'd do that... Looking more closely, I see my earlier comment about ZipStore sharing some state wasn't 100% right. Really all we share is a Path, we don't try to share a particular instance of zipfile.ZipFile or any locks. We just need to ensure that there's a valid Zip file at that path before we try to open it, which leads to...

would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?

Is it possible to have both, by having our write / read calls automatically call open if needed? If so, then maybe with_mode could call close (which writes the footer, making the ZipFile readable), without breaking the original store since any subsequent writes would automatically re-open the ZipFile.

I'm not sure what the downsides of this kind of just-in-time reopening would be.

TomAugspurger avatar Oct 30 '24 14:10 TomAugspurger

upon calling _sync_open, ZipStore instances acquire a lock and a file handle: https://github.com/zarr-developers/zarr-python/blob/4c3081c9b4678d1de438e0d2ee8e0fa368a61f17/src/zarr/storage/zip.py#L86-L93, so I think it's about as stateful as you can get.

d-v-b avatar Oct 30 '24 14:10 d-v-b

Just ran into this when I ran into a BadZipFileError with this:


import zarr
from zarr.storage import ZipStore
import tempfile
from pathlib import Path

tmpdir = Path(tempfile.mkdtemp())
zip_path = tmpdir / "data.zip"

store_w = ZipStore(zip_path, mode='w')
arr_w = zarr.open_array(store_w, mode='w', shape=(5,), dtype='i4')
arr_w[:] = [10, 20, 30, 40, 50]

# Close explicitly or it will fail
# store_w.close()

# READ - Traditional
store_r = ZipStore(zip_path, mode='r')
arr_r = zarr.open_array(store_r, mode='r')
print(arr_r)

we might want ot add some info to that error message to close an existing one, and also document this requirement that the store be closed first

ianhi avatar Oct 31 '25 22:10 ianhi