zarr-python
zarr-python copied to clipboard
ZipStore arguments: 'w' vs 'x'
Dear all,
Could you tell me what is the exact difference between 'w' and 'x' mode for ZipStore creation? Also, what does it mean "truncate" here?
modestring, optional
One of ‘r’ to read an existing file, ‘w’ to truncate and write a new file, ‘a’ to append to an existing file, or ‘x’ to exclusively create and write a new file.
Two related questions:
- How dimension_separator would affect reading/writing arrays?
- Does the store need to be closed after reading data from it if opened in 'r' (reading mode)?
https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ZipStore
Best regards, Aliaksei
- Value of
zarr.__version__: 2.10.3 - Value of
numcodecs.__version__: 0.9.1 - Version of Python interpreter: 3.9.0
- Operating system (Linux/Windows/Mac): Windows 11
- How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): using pip into virtual environment
Could you tell me what is the exact difference between 'w' and 'x' mode for ZipStore creation?
These are related to python's open arguments. w will create a file if it doesn't exist or truncate a file which does - i.e. delete all data, reducing it to 0 bytes. x will fail if the file exists already.
How dimension_separator would affect reading/writing arrays?
It won't really, if you only intend to access the zip using zarr. I'd suggest leaving it as the default, unless you intend to unzip it into a file system with opinions on how many files should be in a directory.
Does the store need to be closed after reading data from it if opened in 'r' (reading mode)?
Yes, just like regular files: use it with a context manager (with statement), it'll make your life easier.
@clbarnes, thank you very much.
Regarding 'r' mode: what could happen if store is not closed after reading? I am not introducing any modifications to it.
Unlike most stores, the ZipStore obeys normal python file-opening semantics. Just like a python file (or zipfile), , the file is automatically closed when the object is garbage collected, but you're not in control of when that happens (and it can be dependent on how your script/ package is structured) so for certain usage patterns it can lead to unpredictable numbers of files being open at once.
Also like python files/ zipfiles, there's a .close() method to explicitly close it (which is what the with statement does implicitly as soon as you leave the block) - that seems to be what the examples use. But in general using it directly is discouraged because if an exception happens before the explicit close, you may never reach it, and you leave the file cleanup to the garbage collector. One alternative pattern would be
try:
store = zarr.ZipStore("some/path.zip")
... # whatever else you want to do
finally:
store.close()
but that's less ergonomic than the with statement.
tl;dr in real life it's unlikely that anything disastrous would happen if you left the file closure up the garbage collector, but it is generally good practice to handle these kinds of resources using the context manager. For the sake of explicitness and "one-- and preferably only one --obvious way to do it", I would highly recommend using the context manager wherever possible.
If you have a large amount of code which would need to be indented here, I would recommend either factoring that inner code into a function which takes an open ZipStore (then it would be generic over any other store too!), or possibly a wrapper class which itself is a context manager which closes the inner ZipStore on __exit__.