re-opening `ZipStore` with a new mode
In main it's not possible to make a read-only copy of a ZipStore instance. Is there any particular reason why this is not allowed, or is it something we could add? Naively, it seems like a read-only copy of any store class should always be safe to generate.
IIRC, ZipStore has some instance level state that makes it hard to re-open. The file needs to be closed and the ZIP footer written before anyone else can open the file (going off memory, so I might have some details wrong).
that means that ZipStore always errors if it reaches this conditional in make_store_path: https://github.com/zarr-developers/zarr-python/blob/4c3081c9b4678d1de438e0d2ee8e0fa368a61f17/src/zarr/storage/common.py#L224-L227
If we want to keep these store semantics (I'm not sure that we do...) we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode. Otherwise, invoking something like open(store=store, path="doesnotexist", mode="r", zarr_format=zarr_format) raises NotImplementedError with no guidance.
would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?
we should add a zipstore-specific exception here that instructs users how to open their zipstored data with a different mode
I wonder how you'd do that... Looking more closely, I see my earlier comment about ZipStore sharing some state wasn't 100% right. Really all we share is a Path, we don't try to share a particular instance of zipfile.ZipFile or any locks. We just need to ensure that there's a valid Zip file at that path before we try to open it, which leads to...
would it be terrible if ZipStore.with_mode(self, mode) called self.close(), effectively invalidating the original store instance?
Is it possible to have both, by having our write / read calls automatically call open if needed? If so, then maybe with_mode could call close (which writes the footer, making the ZipFile readable), without breaking the original store since any subsequent writes would automatically re-open the ZipFile.
I'm not sure what the downsides of this kind of just-in-time reopening would be.
upon calling _sync_open, ZipStore instances acquire a lock and a file handle: https://github.com/zarr-developers/zarr-python/blob/4c3081c9b4678d1de438e0d2ee8e0fa368a61f17/src/zarr/storage/zip.py#L86-L93, so I think it's about as stateful as you can get.
Just ran into this when I ran into a BadZipFileError with this:
import zarr
from zarr.storage import ZipStore
import tempfile
from pathlib import Path
tmpdir = Path(tempfile.mkdtemp())
zip_path = tmpdir / "data.zip"
store_w = ZipStore(zip_path, mode='w')
arr_w = zarr.open_array(store_w, mode='w', shape=(5,), dtype='i4')
arr_w[:] = [10, 20, 30, 40, 50]
# Close explicitly or it will fail
# store_w.close()
# READ - Traditional
store_r = ZipStore(zip_path, mode='r')
arr_r = zarr.open_array(store_r, mode='r')
print(arr_r)
we might want ot add some info to that error message to close an existing one, and also document this requirement that the store be closed first